Semantic Diversification in Equity Portfolios

16 Pages Posted: 7 Jan 2025

See all articles by Crina Pungulescu

Crina Pungulescu

Toulouse Business School - Barcelona Campus

Date Written: January 07, 2025

Abstract

This paper employs various Natural Language Processing (NLP) techniques to assess the potential of text analysis to provide diversification benefits in the context of portfolio management. Two widely used large language models (BERT and GPT) and an alternative AI solution grounded in neuroscience, called "semantic fingerprinting" are put to the test, by creating and comparing "minimum semantic concentration" portfolios using in turn each of the three NLP methods. Leveraging the power of text data in the context of financial diversification has value in terms of risk reduction. The "minimum semantic concentration" portfolio weights minimise semantic similarity (akin to the weights of the "minimum variance" portfolio) based on the premise that the returns of similar companies tend to be correlated and therefore portfolios of dissimilar companies are apt to offer higher diversification benefits. The results confirm that all the NLP methods are able to extract relevant information from business descriptions: the "minimum semantic concentration" portfolios have significantly lower volatility than portfolios constructed with randomly chosen weights. While no NLP method is able to claim absolute superiority over its peers, semantic fingerprinting appears the most consistent and robust performer, while BERT and GPT demonstrate not only their potential but also a caveat, as their performances are volatile even across very similar tasks. In the race to harvest the power of Artificial Intelligence in virtually every field, researchers and practitioners are faced with an ever increasing supply of methods that have not undergone field-specific tests. This paper guides the choice of researchers in economics and finance by comparing the performance of different NLP methods at a task that is fundamental to the field, namely portfolio diversification.

Keywords: JEL classification: G11, G19 Text Analysis, Portfolio Performance, Natural Language Processing, BERT, GPT, Semantic Fingerprinting

Suggested Citation

Pungulescu, Crina, Semantic Diversification in Equity Portfolios (January 07, 2025). Available at SSRN: https://ssrn.com/abstract=5085603 or http://dx.doi.org/10.2139/ssrn.5085603

Crina Pungulescu (Contact Author)

Toulouse Business School - Barcelona Campus ( email )

C/ Trafalgar, 10 08010
Barcelona
Spain

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
48
Abstract Views
186
PlumX Metrics