An End-to-End Model for Multi-View Scene Text Recognition

33 Pages Posted: 7 Apr 2023

See all articles by Ayan Banerjee

Ayan Banerjee

Indian Statistical Institute, New Delhi - Indian Statistical Institute

Shivakumara Palaiahnakote

University of Malaya (UM)

Saumik Bhattacharya

Indian Institute of Technology (IIT), Kharagpur

Umapada Pal

Indian Statistical Institute, New Delhi - Indian Statistical Institute

Cheng-Lin Liu

affiliation not provided to SSRN

Abstract

Due to the increasing applications of surveillance and monitoring such as person re-identification, vehicle re-identification and sports events tracking, the necessity of text detection and end-to-end recognition is also growing. Although the past deep learning-based models have addressed several challenges, such as arbitrary-shaped text, multiple scripts, and variation in the geometric structure of characters, the scope of the models is limited to a single view. This paper presents an end-to-end model for text recognition through refining the multi-views of the same scene, which is called E2EMVSTR (End-to-End Model for Multi-View Scene Text Recognition). Considering the common characteristics shared in multi-view texts, we propose a cycle consistency pairwise similarity-based deep learning model to find texts more efficiently in three input views. Further, the extracted texts are supplied to a Siamese network and semi-supervised attention embedding combinational network for obtaining recognition results. The proposed model combines natural language processing and genetic algorithm models to restore missing character information and correct wrong recognition results. In experiments on our multi-view dataset and several benchmark datasets, the proposed method is proven effective compared to state-of-the-art methods. The dataset and codes will be made available to the public upon acceptance.

Keywords: text detection, scene text recognition, Siamese network, Natural language model, Genetic algorithm, Multi-view text detection

Suggested Citation

Banerjee, Ayan and Palaiahnakote, Shivakumara and Bhattacharya, Saumik and Pal, Umapada and Liu, Cheng-Lin, An End-to-End Model for Multi-View Scene Text Recognition. Available at SSRN: https://ssrn.com/abstract=4412848 or http://dx.doi.org/10.2139/ssrn.4412848

Ayan Banerjee

Indian Statistical Institute, New Delhi - Indian Statistical Institute ( email )

New Delhi
New Delhi, 110016
India

Shivakumara Palaiahnakote (Contact Author)

University of Malaya (UM) ( email )

Institute of Mathematical Sciences, Faculty of Sci
University of Malaya, Lembah Pantai
Kuala Lumpur, 50603
Malaysia

Saumik Bhattacharya

Indian Institute of Technology (IIT), Kharagpur ( email )

Kharagpur
IIT Khragpur
Kharagpur, IN West Bengal 721302
India

Umapada Pal

Indian Statistical Institute, New Delhi - Indian Statistical Institute ( email )

New Delhi
New Delhi, 110016
India

Cheng-Lin Liu

affiliation not provided to SSRN ( email )

Nigeria

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
155
Abstract Views
314
Rank
415,348
PlumX Metrics