Finding Needles in a Haystack: Using Data Analytics to Improve Fraud Prediction
53 Pages Posted: 7 Apr 2015 Last revised: 21 Apr 2016
Date Written: April 6, 2015
Developing models to detect financial statement fraud involves challenges related to (i) the rarity of fraud observations, (ii) the relative abundance of explanatory variables identified in the prior literature, and (iii) the broad underlying definition of fraud. Following the emerging data analytics literature, we introduce and systematically evaluate three methods to address these challenges. Results from evaluating actual cases of financial statement fraud suggest that two of these methods improve fraud prediction performance by approximately ten percent relative to the best current techniques. Improved fraud prediction can result in meaningful benefits, such as improving the ability of the SEC to detect fraudulent filings and improving audit firms’ client portfolio decisions.
Keywords: Financial statement fraud, Data analytics, Fraud rarity, Risk assessment, Data rarity, Data imbalance, Undersampling
JEL Classification: M4, C1
Suggested Citation: Suggested Citation