Line-Wise Script Identification from Handwritten Document Images Using Sift Method
9 Pages Posted: 27 Feb 2018
Date Written: November 15, 2017
Automatic identification of scripts from document images helps selecting appropriate OCR for character recognition and content retrieval. In this paper, Scale invariant Feature Transformation (SIFT) based line-wise script identification has been proposed. A real life handwritten script data are collected from different sources like articles, notes written by persons of different age groups and professions for line segmentation. The line segmentation approach is based on histogram and connected component analysis, wherein non over lapping, Oriented and touching lines are extracted by computing the average height of a text line using histogram profile which forms the basis for text line segmentation. Features are extracted using SIFT approach at line level and KNN classifier has been used to recognize the script. Experiments are performed by extracting the lines from document images consisting of English, Kannada and Devanagari scripts. Overall accuracy reported for the proposed system is encouraging for bi-script and tri-script.
Keywords: Script recognition; Document image; bi-script; tri-script; SIFT; KNN; Scale invariant
Suggested Citation: Suggested Citation