What are You Saying? Using Topic to Detect Financial Misreporting
101 Pages Posted: 5 Jul 2016 Last revised: 23 May 2019
Date Written: May 14, 2019
This study uses a machine learning technique to assess whether the thematic content of financial statement disclosures (labeled as topic) is incrementally informative in predicting intentional misreporting. Using a Bayesian topic modeling algorithm, we determine and empirically quantify the topic content of a large collection of 10-K narratives spanning the 1994 to 2012 period. We find that the algorithm produces a valid set of semantically meaningful topics that are predictive of financial misreporting based on samples of SEC enforcement actions (AAERs) and reporting irregularities identified from financial restatements and 10-K filing amendments. Our out-of-sample tests indicate that topic significantly improves the detection of financial misreporting by as much as 59% when added to models based on commonly-used financial and textual style variables. Furthermore, models that incorporate topic as an additional predictor significantly outperform traditional models when detecting long-duration misreporting events. Taken together, our results suggest that the content of annual report narratives and the attention devoted to each topic are useful signals in detecting financial misreporting.
Keywords: Topic, Disclosure, Latent Dirichlet Allocation, Financial Misreporting
Suggested Citation: Suggested Citation