Whitebox inside blackbox: Using interpretable language models to analyze financial narratives
58 Pages Posted: 25 Sep 2024 Last revised: 20 Oct 2022
Date Written: April 3, 2020
Abstract
This paper applies technological innovation — interpretable language models within the framework of large language models (LLMs) — to financial innovation by extracting context-specific information from unstructured financial narratives with high interpretability. While LLMs like Google BERT and GPT spur financial innovations based on the analysis of unstructured financial texts, they are black-box models, as users cannot see the intermediate syntactical features they capture. We introduce an interpretable language model that visualizes intermediate textual features by integrating neural network-based parsing and word embedding. Although part of the LLM ecosystem, understanding the interpretable component introduced in this study helps researchers explore the full potential of LLMs without sacrificing interpretability. We demonstrate the usefulness of this interpretable language model by constructing a performance-specific sentiment measure for earnings conference calls that better explains cross-sectional returns and future operating performance compared to common sentiment proxies. Unlike Google BERT, our model offers clear interpretability for the improved performance by visualizing syntax and contextual negations, enabling us to separate performance-related discussions from irrelevant content like location or weather talk. A second application, analyzing forward-looking statements in conference calls, further confirms the value of our approach in improving LLM interpretability.
Keywords: textual analysis, machine learning, neural networks, natural language processing, sentiment analysis, conference calls, technological advances, financial innovation
Suggested Citation: Suggested Citation
