How to Use Lexical Density of Company Filings

10 Pages Posted: 13 Sep 2021

Date Written: September 10, 2021

Abstract

This paper analyzes the application of natural language processing (NLP) on the 10-K and the 10-Q company reports. Using the Brain Language Metrics on Company Filings (BLMCF) dataset, which monitors numerous language metrics on 10-Ks and 10-Qs company reports, we analyze various lexical metrics such as lexical richness, lexical density, and specific density.
In simple words, lexical richness says how many unique words are used by the author. The idea is that the more varied vocabulary the author has, the more complex the text is. Secondly, lexical density measures the structure and complexity of human communication in a text. A high lexical density indicates a large amount of information-carrying words. And lastly, specific density measures how dense the report's language is from a financial point of view. In other words, how many finance- related words are used in the text.
Overall, we can say that this type of alternative data exhibits interesting results. Even though lexical richness produced the weakest results (of our strategies) when applied to the investment universe consisting of 500 stocks, it significantly improved when we expanded the investment universe to 3000 stocks. Moreover, the strategies based on the lexical density and specific density improved the Sharpe ratio even further.
In the Last section, we combine the two metrics (Lexical density and Specific density) in one strategy. Applying both of these metrics to the investment universe with 500 stocks produces a Sharpe ratio of 0.688.

Keywords: Alternative data, Artificial Intelligence, Natural language processing, 10-K & 10-Q reports, lexical richness, lexical density

Suggested Citation

Hanicova, Daniela and Kalús, Filip and Vojtko, Radovan, How to Use Lexical Density of Company Filings (September 10, 2021). Available at SSRN: https://ssrn.com/abstract=3921091 or http://dx.doi.org/10.2139/ssrn.3921091

Daniela Hanicova

Quantpedia ( email )

Dulovo namestie 14
Bratislava, 85110
Slovakia

Filip Kalús

Quantpedia ( email )

Dulovo namestie 14
Bratislava, 85110
Slovakia

Radovan Vojtko (Contact Author)

Quantpedia ( email )

Dulovo namestie 14
Bratislava, 85110
Slovakia

HOME PAGE: http://Quantpedia.com

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
168
Abstract Views
1,173
Rank
365,455
PlumX Metrics