Semantic Diversification in Equity Portfolios
16 Pages Posted: 7 Jan 2025
Date Written: January 07, 2025
Abstract
This paper employs various Natural Language Processing (NLP) techniques to assess the potential of text analysis to provide diversification benefits in the context of portfolio management. Two widely used large language models (BERT and GPT) and an alternative AI solution grounded in neuroscience, called "semantic fingerprinting" are put to the test, by creating and comparing "minimum semantic concentration" portfolios using in turn each of the three NLP methods. Leveraging the power of text data in the context of financial diversification has value in terms of risk reduction. The "minimum semantic concentration" portfolio weights minimise semantic similarity (akin to the weights of the "minimum variance" portfolio) based on the premise that the returns of similar companies tend to be correlated and therefore portfolios of dissimilar companies are apt to offer higher diversification benefits. The results confirm that all the NLP methods are able to extract relevant information from business descriptions: the "minimum semantic concentration" portfolios have significantly lower volatility than portfolios constructed with randomly chosen weights. While no NLP method is able to claim absolute superiority over its peers, semantic fingerprinting appears the most consistent and robust performer, while BERT and GPT demonstrate not only their potential but also a caveat, as their performances are volatile even across very similar tasks. In the race to harvest the power of Artificial Intelligence in virtually every field, researchers and practitioners are faced with an ever increasing supply of methods that have not undergone field-specific tests. This paper guides the choice of researchers in economics and finance by comparing the performance of different NLP methods at a task that is fundamental to the field, namely portfolio diversification.
Keywords: JEL classification: G11, G19 Text Analysis, Portfolio Performance, Natural Language Processing, BERT, GPT, Semantic Fingerprinting
Suggested Citation: Suggested Citation