FinBERT - A Large Language Model for Extracting Information from Financial Text

74 Pages Posted: 27 Aug 2021 Last revised: 26 Sep 2022

See all articles by Allen H. Huang

Allen H. Huang

Hong Kong University of Science and Technology - Department of Accounting

Hui Wang

Renmin University of China - School of Business

Yi Yang

HKUST Business School

Date Written: July 28, 2020

Abstract

We develop FinBERT, a state-of-the-art large language model that adapts to the finance domain. We show that FinBERT incorporates finance knowledge and can better summarize contextual information in financial texts. Using a sample of researcher-labeled sentences from analyst reports, we document that FinBERT substantially outperforms the Loughran and McDonald dictionary and other machine learning algorithms, including naïve Bayes, support vector machine, random forest, convolutional neural network, and long short-term memory, in sentiment classification. Our results show that FinBERT excels in identifying the positive or negative sentiment of sentences that other algorithms mislabel as neutral, likely because it uses contextual information in financial text. We find that FinBERT’s advantage over other algorithms, and Google’s original bidirectional encoder representations from transformers (BERT) model, is especially salient when the training sample size is small and in texts containing financial words not frequently used in general texts. FinBERT also outperforms other models in identifying discussions related to environment, social, and governance issues. Last, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 18%, compared with FinBERT. Our results have implications for academic researchers, investment professionals, and financial market regulators.

Keywords: Deep Learning; Large Language Model; Transfer Learning; Interpretable Machine Learning; Sentiment Classification; Environment, Social, and Governance (ESG)

JEL Classification: C45, D83, G14, G24, G32, M40, M41

Suggested Citation

Huang, Allen H. and Wang, Hui and Yang, Yi, FinBERT - A Large Language Model for Extracting Information from Financial Text (July 28, 2020). Contemporary Accounting Research, Forthcoming, Available at SSRN: https://ssrn.com/abstract=3910214 or http://dx.doi.org/10.2139/ssrn.3910214

Allen H. Huang (Contact Author)

Hong Kong University of Science and Technology - Department of Accounting ( email )

LSK Business School Building
HKUST
Clear Water Bay, Kowloon
Hong Kong

HOME PAGE: http://www.AllenHuang.org

Hui Wang

Renmin University of China - School of Business ( email )

Beijing
China

Yi Yang

HKUST Business School ( email )

Clearwater Bay
Kowloon, 999999
Hong Kong

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
5,938
Abstract Views
15,994
Rank
2,883
PlumX Metrics