What are You Saying? Using Topic to Detect Financial Misreporting
76 Pages Posted: 5 Jul 2016 Last revised: 26 Mar 2018
Date Written: March 21, 2018
This study uses a machine learning technique to assess whether the thematic content of financial statement disclosures (labeled as topic) is incrementally informative in predicting intentional misreporting. Using a Bayesian topic modeling algorithm, we determine and empirically quantify the topic content of a large collection of 10-K narratives spanning the 1994 to 2012 period. We find that the algorithm produces a valid set of semantically meaningful topics that are predictive of financial misreporting based on samples of SEC enforcement actions (AAERs) and irregularity restatements arising from intentional GAAP violations. Our out-of-sample tests indicate that topic significantly improves the detection of financial misreporting when added to models based on commonly-used financial and textual style variables. Furthermore, we find that models including topic outperform traditional models when predicting long-duration misstatements. These results are robust to alternative topic definitions and regression specifications and various controls for firms with repeated instances of financial misreporting.
Keywords: Topic, Disclosure, Latent Dirichlet Allocation, Financial Misreporting
Suggested Citation: Suggested Citation