61 Pages Posted: 5 Jul 2016 Last revised: 14 Sep 2016
Date Written: September 12, 2016
This study uses a machine learning technique to assess whether the thematic content of financial statement disclosures (labeled as topic) is incrementally informative in predicting intentional misreporting. Using a Bayesian topic modeling algorithm, we determine and empirically quantify the topic content of a large collection of 10-K narratives spanning the 1994 to 2012 period. We find that the algorithm produces a valid set of semantically meaningful topics that are predictive of financial misreporting based on samples of SEC enforcement actions (AAERs) and irregularity restatements arising from intentional GAAP violations. Our out-of-sample tests indicate that models based on topic outperform models of commonly-used financial and textual style variables. Furthermore, we find that topic significantly improves the detection of high risk accounting misstatements when added to models based on financial and textual style metrics. These results are robust to alternative topic definitions and regression specifications.
Keywords: Topic, Disclosure, Latent Dirichlet Allocation, Financial Misreporting
Suggested Citation: Suggested Citation
Brown, Nerissa C. and Crowley, Richard M. and Elliott, W. Brooke, What are You Saying? Using Topic to Detect Financial Misreporting (September 12, 2016). 27th Annual Conference on Financial Economics and Accounting Paper. Available at SSRN: https://ssrn.com/abstract=2803733 or http://dx.doi.org/10.2139/ssrn.2803733