Detecting Financial Misconduct Using NLP and Machine Learning: Evidence from Japan

Yazawa, Kenichi; Araragi, Kazuo; Itakura, Yoshinao; Usuki, Teppei; Hattori, Daichi; Mizuno, Satoshi

doi:10.2139/ssrn.5212010

Download This Paper

Open PDF in Browser

Add Paper to My Library

Detecting Financial Misconduct Using NLP and Machine Learning: Evidence from Japan

34 Pages Posted: 16 Apr 2025 Last revised: 17 Apr 2025

See all articles by Kenichi Yazawa

Satoshi Mizuno

KPMG Azsa LLC

Date Written: August 25, 2024

Abstract

This study aims to develop a novel model for detecting financial fraud using textual data extracted from the annual securities reports of Japanese listed companies from 2010 to 2019. Specifically, the analysis focuses on Management's Discussion and Analysis (MD&A) and broader textual disclosures, including corporate policies and strategies, risk factors, and governance practices. Using natural language processing (NLP) techniques, a series of linguistic variables were created. These variables, along with financial data, were utilized to construct a model based on Weighted Random Forest (WRF), achieving a high AUC score of 0.907. Key characteristics of fraudulent companies identified in this study include: (1) negative tone, complexity, and fewer ratio-based expressions in the MD&A section, (2) positive tone and frequent references to third parties in risk information, and (3) readability yet fewer named entities in governance disclosures. Overall, this study demonstrates that leveraging textual data provides an effective new approach to predicting financial fraud and has the potential to contribute to corporate fraud prevention.

Keywords: Financial fraud, natural language processing, machine learning, annual securities reports, MD&A, corporate governance, risk analysis

Suggested Citation: Suggested Citation

Yazawa, Kenichi and Araragi, Kazuo and Itakura, Yoshinao and Usuki, Teppei and Hattori, Daichi and Mizuno, Satoshi, Detecting Financial Misconduct Using NLP and Machine Learning: Evidence from Japan (August 25, 2024). Available at SSRN: https://ssrn.com/abstract=5212010 or http://dx.doi.org/10.2139/ssrn.5212010