43 Pages Posted: 19 Jul 2007
Date Written: July 19, 2007
The paper presents a complete method for using automatic techniques to code printed text pages. It involves three automatic steps and one or two steps of manual corrections to obtain fully accurate results. We discovered that present-day consumer digital cameras are much better than high-end scanners to obtain pictures of printed pages quickly and without the wear and tear associated with scanners. We also found that high-end ($370) OCR software is much more cost-effective to achieve accurate text recognition and to process large amounts of data. We also describe how researchers can write a computer program for classifying automatically non-uniform data. We provide detailed instructions for each step in the automatic coding method so that other researchers can readily copy it.
Keywords: data collection, automatic coding, research methods
JEL Classification: C80, N80, M10
Suggested Citation: Suggested Citation
Murmann, Johann Peter and Homburg, Ernst and Geven, Ruud and Bermiss, Y Sekou and Forgione, Alfonzo, Automatic Coding of Printed Materials (July 19, 2007). Available at SSRN: https://ssrn.com/abstract=1001568 or http://dx.doi.org/10.2139/ssrn.1001568