Automatic Coding of Printed Materials

43 Pages Posted: 19 Jul 2007  

Johann Peter Murmann

UNSW Australia Business School - AGSM

Ernst Homburg

University of Maastricht - Department of History

Ruud Geven

affiliation not provided to SSRN

Y Sekou Bermiss

University of Texas at Austin

Alfonzo Forgione

affiliation not provided to SSRN

Date Written: July 19, 2007

Abstract

The paper presents a complete method for using automatic techniques to code printed text pages. It involves three automatic steps and one or two steps of manual corrections to obtain fully accurate results. We discovered that present-day consumer digital cameras are much better than high-end scanners to obtain pictures of printed pages quickly and without the wear and tear associated with scanners. We also found that high-end ($370) OCR software is much more cost-effective to achieve accurate text recognition and to process large amounts of data. We also describe how researchers can write a computer program for classifying automatically non-uniform data. We provide detailed instructions for each step in the automatic coding method so that other researchers can readily copy it.

Keywords: data collection, automatic coding, research methods

JEL Classification: C80, N80, M10

Suggested Citation

Murmann, Johann Peter and Homburg, Ernst and Geven, Ruud and Bermiss, Y Sekou and Forgione, Alfonzo, Automatic Coding of Printed Materials (July 19, 2007). Available at SSRN: https://ssrn.com/abstract=1001568 or http://dx.doi.org/10.2139/ssrn.1001568

Johann Peter Murmann (Contact Author)

UNSW Australia Business School - AGSM ( email )

UNSW Sydney, NSW 2052
Australia
+61-2-9385-9733 (Phone)

HOME PAGE: http://professor-murmann.net

Ernst Homburg

University of Maastricht - Department of History ( email )

P.O. Box 616
Maastricht, 6200MD
Netherlands

Ruud Geven

affiliation not provided to SSRN ( email )

No Address Available

Yerodin Sekou Bermiss

University of Texas at Austin ( email )

Austin, TX 78712
United States

Alfonzo Forgione

affiliation not provided to SSRN ( email )

No Address Available

Paper statistics

Downloads
237
Rank
104,927
Abstract Views
1,731