FinBERT—A Deep Learning Approach to Extracting Textual Information
56 Pages Posted: 27 Aug 2021
Date Written: July 28, 2020
In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.
Keywords: Natural Language Processing; Machine Learning; Deep Learning; Textual Analysis; Sentiment Classification; Informativeness; Earnings Conference Call
JEL Classification: D83, G14, G30, M40, M41
Suggested Citation: Suggested Citation