Textual Regression for Realized Volatility: A Model for Long-Term Forecasting
58 Pages Posted: 10 Apr 2025 Last revised: 21 Feb 2025
Date Written: February 01, 2025
Abstract
This study investigates the role of textual information in forecasting realized volatility in financial markets. Using state-of-the-art large language models (LLMs) such as LLaMA, OPT, BERT, and RoBERTa, we develop novel textual regression models to predict one-day-, one-week-, two-week-, and one-month-ahead volatility for corn, soybean, and wheat markets based on news articles published by media agencies. Our results show that these models significantly outperform autoregressive models like HAR, as well as sentiment-based approaches and news count benchmarks, particularly in long-term forecasting. The findings also reveal relationships between forecasting accuracy and both the size and type of the language model. We address potential look-ahead bias concerns through robust comparisons of finance-specific and general-purpose language models. Shapley value analysis demonstrates that market-related terms enhance short-term forecasts, while production-and weather-related terms drive long-term predictions. Our findings underscore the utility of textual data for forecasting volatility and provide a foundation for further applications in financial markets and risk management.
Keywords: Text regression, Natural language processing (NLP), Large language model (LLM), Agriculture, Volatility modelling, Forecasting
Suggested Citation: Suggested Citation