Can Words Reveal Fraud? A Lexicon Approach to Detecting Fraudulent Financial Reporting

60 Pages Posted: 28 Jan 2024

Date Written: June 8, 2022

Abstract

I provide a method to develop a fraud lexicon containing predictive words that can help detect fraudulent financial reporting. Additionally, I present the application of a Balanced Random Forest classifier and show it is well suited to predict and detect fraud in financial reports when faced with a significant class imbalance. This classifier uses the fraud lexicon as a feature set and results in strong classification accuracy across multiple samples of fraud firms and peer firms over a period of 18 years from 2000 to 2017 with out-of-sample performance better than a random guess by 40 to 48 percent. I further show the incremental classification performance of the fraud lexicon and detail the performance of my classifier compared to alternative language-based fraud detection techniques. The fraud lexicon developed in this study provides a suitable word list that can be used by researchers and practitioners conducting “bag-of-words” analysis to detect fraud. The classifier presented can be used for statistical textual analysis. Additionally, the results presented may be of interest to all users of financial statements including regulators, auditors, and investors looking to enhance their fraud risk assessment procedures and detect fraudulent financial reporting.

Keywords: Fraud detection, natural language processing, computational linguistics

Suggested Citation

Ahmed, Daniyal, Can Words Reveal Fraud? A Lexicon Approach to Detecting Fraudulent Financial Reporting (June 8, 2022). Available at SSRN: https://ssrn.com/abstract=4693437 or http://dx.doi.org/10.2139/ssrn.4693437

Daniyal Ahmed (Contact Author)

PricewaterhouseCoopers LLP ( email )

18 York St
Suite 2600
Toronto, Ontario M5J0B2
Canada
6475504284 (Phone)
L5M 0G8 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
153
Abstract Views
583
Rank
396,410
PlumX Metrics