Structure in the Tweet Haystack: Uncovering the Link between Text-Based Sentiment Signals and Financial Markets
45 Pages Posted: 6 Sep 2015 Last revised: 8 Jan 2016
Date Written: October 1, 2015
We examine the relationship between signals derived from unstructured social media microblog text data and financial market developments. Employing statistical language modeling techniques we construct directional user sentiment and non-directional topic disagreement metrics and link these to S&P 500 index returns and volatility. Based on an extensive five year sample of Twitter messages our study shows that both unsupervised and supervised statistical learning methods successfully identify subsets of expert users with distinct finance focus. This allows to filter out the substantial noise associated with social media. Accounting for salient properties of the time series in ARMA models we document significant effects of expert disagreement signals on current and future S&P volatility. Moreover, we detect a significant contemporaneous relation between expert sentiment signals and S&P returns.
Keywords: Natural Language Processing, Sentiment Analysis, Unstructured Social Media Data, Big Data
JEL Classification: G14, C32
Suggested Citation: Suggested Citation