Show simple item record

dc.contributor.advisorSivilotti, Paul
dc.creatorSowald, Chad
dc.date.accessioned2013-05-02T03:14:33Z
dc.date.available2013-05-02T03:14:33Z
dc.date.issued2009-05-13
dc.identifier.urihttp://hdl.handle.net/1811/54735
dc.descriptionEngineering: 1st Place (The Ohio State University Denman Undergraduate Research Forum)en_US
dc.description.abstractToday's Internet user has a limited amount of time to manually mine the Internet for content such as videos, images, and documents that they want to view. Much of the user's time is wasted overhead: clicking hyperlinks, waiting for pages to load, and actually downloading the content for offline viewing. Therefore, many users would benefit from an application that could automatically crawl and download a large amount of content from the Internet, so that users could browse and further filter the content offline at a much faster speed and without the unnecessary overhead. I have developed a web crawling and downloading program, File Harvest - written in C# and using the .NET framework - that allows the user to quickly configure the web crawling mechanism before starting it. The web crawler functions by following hyperlinks and examining each page it encounters along the way. The user specifies what web pages to crawl, how many levels of hyperlinks to crawl, and what types of content to download. The primary insight of the work is the value of combining crawling and downloading in a single program – something that related efforts have yet to do. The program uses various web page analysis techniques such as HTTP traffic proxying and static analysis of the page HTML to help the user find as much relevant content as possible to download. There are some limitations as to what can be found through crawling, and these limitations are the primary focus of the research going forward. In general, File Harvest can greatly expedite the discovery and downloading of media for users.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries2009 Richard J. and Martha D. Denman Undergraduate Research Forum. 14then_US
dc.subjectweb crawlingen_US
dc.subjectmass downloadingen_US
dc.subjecthttp proxyen_US
dc.subjectflash video detectionen_US
dc.subjectC# .NET Windowsen_US
dc.titleFile Harvest: Targeted, Legal Crawling and Downloading of Online Media (Poster)en_US
dc.typePresentationen_US
dc.type.genrePosteren_US
dc.rights.ccAttribution-NonCommercial-NoDerivs 3.0 United Statesen_US
dc.rights.ccurihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/en_US
dc.description.academicmajorAcademic Major: Computer Science and Engineeringen_US


Files in this item

Thumbnail

Items in Knowledge Bank are protected by copyright, with all rights reserved, unless otherwise indicated.

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States