An Improved Text Extraction Approach With Auto Encoder for Creating Your Own Audiobook

An Improved Text Extraction Approach With Auto Encoder for Creating Your Own Audiobook

Shakkthi Rajkumar, Shruthi Muthukumar, Aparna S. S., Angelin Gladston
Copyright: © 2022 |Volume: 12 |Issue: 1 |Pages: 17
ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781683182085|DOI: 10.4018/IJIRR.289570
Cite Article Cite Article

MLA

Rajkumar, Shakkthi, et al. "An Improved Text Extraction Approach With Auto Encoder for Creating Your Own Audiobook." IJIRR vol.12, no.1 2022: pp.1-17. http://doi.org/10.4018/IJIRR.289570

APA

Rajkumar, S., Muthukumar, S., Aparna S. S., & Gladston, A. (2022). An Improved Text Extraction Approach With Auto Encoder for Creating Your Own Audiobook. International Journal of Information Retrieval Research (IJIRR), 12(1), 1-17. http://doi.org/10.4018/IJIRR.289570

Chicago

Rajkumar, Shakkthi, et al. "An Improved Text Extraction Approach With Auto Encoder for Creating Your Own Audiobook," International Journal of Information Retrieval Research (IJIRR) 12, no.1: 1-17. http://doi.org/10.4018/IJIRR.289570

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

As we all know, listening makes learning easier and interesting than reading. An audiobook is a software that converts text to speech. Though this sounds good, the audiobooks available in the market are not free and feasible for everyone. Added to this, we find that these audiobooks are only meant for fictional stories, novels or comics. A comprehensive review of the available literature shows that very little intensive work was done for image to speech conversion. In this paper, we employ various strategies for the entire process. As an initial step, deep learning techniques are constructed to denoise the images that are fed to the system. This is followed by text extraction with the help of OCR engines. Additional improvements are made to improve the quality of text extraction and post processing spell check mechanism are incorporated for this purpose. Our result analysis demonstrates that with denoising and spell checking, our model has achieved an accuracy of 98.11% when compared to 84.02% without any denoising or spell check mechanism.