Text Region Detection and Recognition in Natural Scene Images Using MSER and Convolutional Neural Network
11 Pages Posted: 30 Nov 2020
Date Written: November 21, 2020
Text detection and recognition in natural scene images is a computer vision problem that remained a challenge for computer engineers for quite a long time. The new advancements in deep learning have revolutionized the world of computer vision. This paper attempts to build a Deep Learning (DL) based Text detection and recognition model for interpreting the text in natural scene images. The proposed model consists of three stages namely candidate text region detection, text region extraction, and text recognition. The natural scene image is first fed to the candidate text region detection mechanism which extracts potential regions containing text characters. The regions containing non-text which are introduced in the first stage of processing are filtered in the second stage. The set of text regions resulted from the second stage is then recognized in the final stage. Maximally Stable Extremal Region (MSER) algorithm is used in the candidate text region detection. Two convolutional neural networks, one in the text region extraction stage and the other one in the text recognition stage, are used in the proposed model. Text detection in natural scenes is not an easy problem as it appears. The complexity of detection and recognition of text characters in natural scene images is mainly due to the diversity of the textual characters and the natural scene, presence of various disturbances, different illumination conditions, different color, size, and area of the text. ICDAR-2011, ICDAR-2013, CHARS-74K, and CIFAR-100 datasets are used for training and validating our models.
Keywords: Text region detection, text recognition, maximally stable extremal region, convolutional neural network
Suggested Citation: Suggested Citation