Machine Learning for Pattern Discovery in Management Research
44 Pages Posted: 14 Jan 2020 Last revised: 26 Jun 2020
Date Written: June 23, 2020
Supervised machine learning (ML) methods are a powerful toolkit for discovering robust patterns in quantitative data. The patterns identified by ML could be used for exploratory inductive or abductive research, or for post-hoc analysis of regression results to detect patterns that may have gone unnoticed. However, ML models should not be treated as the result of a deductive causal test. To demonstrate the application of ML for pattern discovery, we implement ML algorithms to study employee turnover at a large technology company. We interpret the relationships between variables using partial dependence plots, which uncover surprising nonlinear and interdependent patterns between variables that may have gone unnoticed using traditional methods. To guide readers evaluating ML for pattern discovery, we provide guidance for evaluating model performance, highlight human decisions in the process, and warn of common misinterpretation pitfalls. An online appendix provides code and data to implement the algorithms demonstrated in the paper.
Keywords: machine learning, supervised machine learning, induction, abduction, exploratory data analysis, pattern discovery, decision trees, random forests, neural networks, ROC curve, confusion matrix, partial dependence plots
Suggested Citation: Suggested Citation