Implementation of Tesseract Algorithm to Extract Text from Different Images

5 Pages Posted: 1 May 2020

See all articles by Muskan Chawla

Muskan Chawla

Bharati Vidyapeeth’s College of Engineering

Rachna Jain

Bharati Vidyapeeth’s College of Engineering

Preeti Nagrath

Bharati Vidyapeeth’s College of Engineering; University of Mumbai

Date Written: May 1, 2020

Abstract

Image processing is one of the most growing fields in research and technology in today’s world. There is a high demand of a computer system that can store the information available in newspapers and other hard copy paper documents. One of the most simplest ways to store the information of text into computer systems in by scanning the paper. It can then be stored in the computer and changes can be made on it if required. But, detection of text from the captured image is a very challenging task. Thus, an attempt has been made using the Tesseract algorithm that makes it easier to extract text from images.

Keywords: Tesseract, Text Recognition, Optical Character Recognition, Flatbed scanner, Guilloche pattern, Leptonica

Suggested Citation

Chawla, Muskan and Jain, Rachna and Nagrath, Preeti, Implementation of Tesseract Algorithm to Extract Text from Different Images (May 1, 2020). Proceedings of the International Conference on Innovative Computing & Communications (ICICC) 2020, Available at SSRN: https://ssrn.com/abstract=3589972 or http://dx.doi.org/10.2139/ssrn.3589972

Muskan Chawla (Contact Author)

Bharati Vidyapeeth’s College of Engineering ( email )

New Delhi, 110063
India

Rachna Jain

Bharati Vidyapeeth’s College of Engineering ( email )

New Delhi, 110063
India

Preeti Nagrath

Bharati Vidyapeeth’s College of Engineering ( email )

New Delhi, 110063
India

University of Mumbai ( email )

Mahatma Gandhi Road
Mumbai, 400032
India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
1,135
Abstract Views
3,205
Rank
41,464
PlumX Metrics