Reference Hub37
Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques

Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques

Karthikeyan T., Karthik Sekaran, Ranjith D., Vinoth Kumar V., Balajee J M
Copyright: © 2019 |Volume: 11 |Issue: 2 |Pages: 12
ISSN: 1938-0194|EISSN: 1938-0208|EISBN13: 9781522565192|DOI: 10.4018/IJWP.2019070103
Cite Article Cite Article

MLA

Karthikeyan T., et al. "Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques." IJWP vol.11, no.2 2019: pp.41-52. http://doi.org/10.4018/IJWP.2019070103

APA

Karthikeyan T., Sekaran, K., Ranjith D., Vinoth Kumar V., & Balajee J M. (2019). Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques. International Journal of Web Portals (IJWP), 11(2), 41-52. http://doi.org/10.4018/IJWP.2019070103

Chicago

Karthikeyan T., et al. "Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques," International Journal of Web Portals (IJWP) 11, no.2: 41-52. http://doi.org/10.4018/IJWP.2019070103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Web scraping is a technique to extract information from various web documents automatically. It retrieves the related contents based on the query, aggregates and transforms the data from an unstructured format into a structured representation. Text classification becomes a vital phase to summarize the data and in categorizing the webpages adequately. In this article, using effective web scraping methodologies, the data is initially extracted from websites, then transformed into a structured form. Based on the keywords from the data, the documents are classified and labeled. A recursive feature elimination technique is applied to the data to select the best candidate feature subset. The final data-set trained with standard machine learning algorithms. The proposed model performs well on classifying the documents from the extracted data with a better accuracy rate.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.