Reference Hub

This research has been cited in:

Conference
Intelligent Positioning Method of Paging Buttons Based on Machine Learning2022 4th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST57965.2022.10061879
Chapter
Ranking for Better Indexing in the Hidden WebDark Web Pattern Recognition and Crime Analysis Using Machine Intelligence10.4018/978-1-6684-3942-5.ch012

Design of a Parallel and Scalable Crawler for the Hidden Web

Sonali Gupta, Komal Kumar Bhatia

Source Title: International Journal of Information Retrieval Research (IJIRR)12(1)

ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781683182085|DOI: 10.4018/IJIRR.289612

Cite Article Cite Article

MLA

Gupta, Sonali, and Komal Kumar Bhatia. "Design of a Parallel and Scalable Crawler for the Hidden Web." IJIRR vol.12, no.1 2022: pp.1-23. http://doi.org/10.4018/IJIRR.289612

APA

Gupta, S. & Bhatia, K. K. (2022). Design of a Parallel and Scalable Crawler for the Hidden Web. International Journal of Information Retrieval Research (IJIRR), 12(1), 1-23. http://doi.org/10.4018/IJIRR.289612

Chicago

Gupta, Sonali, and Komal Kumar Bhatia. "Design of a Parallel and Scalable Crawler for the Hidden Web," International Journal of Information Retrieval Research (IJIRR) 12, no.1: 1-23. http://doi.org/10.4018/IJIRR.289612

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

The WWW contains huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals / magazine), blogs etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the Hidden Web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, we present the architecture of a parallel crawler for the Hidden Web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel Hidden web crawler (PSHWC), not only effectively but also efficiently extracts and download the contents in the Hidden web databases