Ranking factors with news
32 Pages Posted: 13 Dec 2024
Date Written: October 21, 2024
Abstract
Recent literature determines feature importance by ranking the decline in the achieved variance explanation of the machine learning model through systematically assigning firm characteristics to zero, one at a time. Alongside annually updated machine learning models, this ranking procedure identifies most monthly-updated firm characteristics as dominant over those updated quarterly and annually. Our novel approach levels the playing field among characteristics by finding substitute monthly-updated news content for each firm characteristic and ranks the performance of machine learning models in the absence of a focal firm characteristic. We find that a subset of firm characteristics related to trading volume, volatility, and asset growth are replaceable with firm-level news, exhibiting time variation. Another subset of firm characteristics, including short-term momentum, momentum reversal, asset turnover, employment, and R&D, cannot be replaced with firm-level news when financial conditions are not too relaxed for too long. The update frequency of firm characteristics does not determine this distinction. Our numerical analysis demonstrates that by removing the identified substitutable characteristics using historical data each year, the Sharpe ratio of the machine learning long-short portfolio can be boosted to 0.89 for the period 2007 to 2019. Furthermore, it can be further improved to 0.94 by incorporating the identified most impactful words to replace the substitutable characteristics.
Keywords: Machine learning, Return prediction, Financial news media, Text-based analysis
Suggested Citation: Suggested Citation
(October 21, 2024). Available at SSRN: https://ssrn.com/abstract=4994186 or http://dx.doi.org/10.2139/ssrn.4994186