Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)

56 Pages Posted: 11 Jan 2023

See all articles by Stephen Meisenbacher

Stephen Meisenbacher

Technische Universität München (TUM)

Peter Norlander

Loyola University Chicago, Quinlan School of Business, Department of Management; Global Labor Organization (GLO); IDEAL The Institute for Data, Econometrics, Algorithms, and Learning

Date Written: December 19, 2022

Abstract

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, and expert classification of any documents with any scheme. To demonstrate this process for building data from text with Machine Learning, we publish open-source resources: the software, a new public document corpus, and a replicable analysis to build an interpretable classifier of suspected "no poach" clauses in franchise documents.

Keywords: machine learning, natural language processing, text classification, big data

JEL Classification: B41, C38, C81, C88, J08, J41, J42, J47, J53, and Z13

Suggested Citation

Meisenbacher, Stephen and Norlander, Peter, Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML) (December 19, 2022). Available at SSRN: https://ssrn.com/abstract=4321894 or http://dx.doi.org/10.2139/ssrn.4321894

Stephen Meisenbacher

Technische Universität München (TUM) ( email )

Peter Norlander (Contact Author)

Loyola University Chicago, Quinlan School of Business, Department of Management ( email )

16 East Pearson St
Chicago, IL 60611
United States
312-915-6615 (Phone)

HOME PAGE: http://https://www.luc.edu/quinlan/faculty/peternorlander.shtml

Global Labor Organization (GLO) ( email )

Collogne
Germany

HOME PAGE: http://https://glabor.org/user/pnorlander/

IDEAL The Institute for Data, Econometrics, Algorithms, and Learning ( email )

Chicago, IL

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
86
Abstract Views
942
Rank
765,566
PlumX Metrics