On-chain Analytics for Sentiment-driven Statistical Causality in Cryptocurrencies
36 Pages Posted: 11 Feb 2021 Last revised: 5 Aug 2021
Date Written: December 8, 2020
This paper establishes a new framework for assessing multimodal statistical causality between cryptocurrency market (cryptomarket) sentiment and cryptocurrency price processes. In order to achieve this we present an efficient algorithm for multimodal statistical causality analysis based on Multiple-Output Gaussian Processes. Signals from different information sources (modalities) are jointly modelled as a Multiple-Output Gaussian Process, and then using a novel approach to statistical causality based on Gaussian Processes (GP), we study linear and non-linear causal effects between the different modalities. We demonstrate the effectiveness of our approach in a machine learning application studying the relationship between cryptocurrency spot price dynamics and sentiment time-series data specific to the crypto sector, which we conjecture influences retail investor behaviour. The investor sentiment is extracted from cryptomarket news data via methods developed in the area of statistical machine learning known as Natural Language Processing (NLP). To capture sentiment, we present a novel framework for text to time-series embedding, which we then use to construct a sentiment index from publicly available news articles. We conduct a statistical analysis of our sentiment statistical index model and compare it to alternative state-of-the-art sentiment models popular in the NLP literature. In regards to the multimodal causality, the investor sentiment is our primary modality of exploration, in addition to price and a blockchain technology-related indicator (hash rate). Analysis shows that our approach is effective in modelling causal structures of variable degree of complexity between heterogeneous data sources, and illustrates the impact that certain modelling choices for the different modalities can have on detecting causality. A solid understanding of these factors is necessary to gauge cryptocurrency adoption by retail investors and provide sentiment- and technology-based insights about the cryptocurrency market dynamics.
Keywords: Multiple-Output Gaussian Process, Granger causality, sentiment index, sentiment analysis, text mining, multimodal systems, heterogeneous data, cryptocurrencies, cryptocoin markets, natural language processing
Suggested Citation: Suggested Citation