Whitebox inside blackbox: Using interpretable language models to analyze financial narratives

58 Pages Posted: 25 Sep 2024 Last revised: 20 Oct 2022

See all articles by Sean Cao

Sean Cao

- Robert H. Smith School of Business

Yongtae Kim

Santa Clara University - Leavey School of Business

Angie Wang

The Hong Kong Polytechnic University - School of Accounting and Finance

Houping Xiao

Georgia State University - J. Mack Robinson College of Business

Date Written: April 3, 2020

Abstract

This paper applies technological innovation — interpretable language models within the framework of large language models (LLMs) — to financial innovation by extracting context-specific information from unstructured financial narratives with high interpretability. While LLMs like Google BERT and GPT spur financial innovations based on the analysis of unstructured financial texts, they are black-box models, as users cannot see the intermediate syntactical features they capture. We introduce an interpretable language model that visualizes intermediate textual features by integrating neural network-based parsing and word embedding. Although part of the LLM ecosystem, understanding the interpretable component introduced in this study helps researchers explore the full potential of LLMs without sacrificing interpretability. We demonstrate the usefulness of this interpretable language model by constructing a performance-specific sentiment measure for earnings conference calls that better explains cross-sectional returns and future operating performance compared to common sentiment proxies. Unlike Google BERT, our model offers clear interpretability for the improved performance by visualizing syntax and contextual negations, enabling us to separate performance-related discussions from irrelevant content like location or weather talk. A second application, analyzing forward-looking statements in conference calls, further confirms the value of our approach in improving LLM interpretability.

Keywords: textual analysis, machine learning, neural networks, natural language processing, sentiment analysis, conference calls, technological advances, financial innovation

Suggested Citation

Cao, Sean S. and Kim, Yongtae and Wang, Angie and Xiao, Houping,
Whitebox inside blackbox: Using interpretable language models to analyze financial narratives
(April 3, 2020). Available at SSRN: https://ssrn.com/abstract=3568504 or http://dx.doi.org/10.2139/ssrn.3568504

Sean S. Cao

- Robert H. Smith School of Business ( email )

College Park, MD 20742-1815
United States

Yongtae Kim (Contact Author)

Santa Clara University - Leavey School of Business ( email )

500 El Camino Real
Santa Clara, CA California 95053
United States
(408) 554-4667 (Phone)
(408) 554-2331 (Fax)

Angie Wang

The Hong Kong Polytechnic University - School of Accounting and Finance ( email )

Houping Xiao

Georgia State University - J. Mack Robinson College of Business ( email )

P.O. Box 4050
Atlanta, GA 30303-3083
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
945
Abstract Views
5,762
Rank
61,242
PlumX Metrics