Implementation of a Hidden Web crawler

MAHAMEDI, Soundous; Supervisor:  SAOUDI, Lalia

Implementation of a Hidden Web crawler

Files

MAHAMEDI Soundous.PDF (9.25 MB)

Date

2015-06-10

Authors

MAHAMEDI, Soundous

Supervisor: SAOUDI, Lalia

Publisher

University of M'sila

Abstract

Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content "hidden" behind search forms, in large searchable electronic databases. In this work, we provide a framework for addressing the problem of extracting content from this hidden Web, that is why we have built a task-specific hidden Web crawler called the Intelligent Hidden Web Crawler (IHiWC). We describe the architecture of IHiWC and present a number of new techniques that went into its design, approach and implementation. We also present results from experiments we conducted to test and validate our techniques.

Keywords

Deep crawler, Hidden Web Crawling, forms classification, forms submission

URI

http://dspace.univ-msila.dz:8080//xmlui/handle/123456789/38681

Collections

Master Thesis

Full item page

Implementation of a Hidden Web crawler

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections