More than a Feeling: Benchmarks for Sentiment Analysis Accuracy
24 Pages Posted: 5 Dec 2019 Last revised: 3 Aug 2020
Date Written: July 31, 2020
The written word is the oldest and most common type of data. Today, mass literacy and cheap technology allow for greater word output per capita than ever before in human history. To keep pace, companies and scholars increasingly depend on automated analyses — not only of what people say (content) but also how they feel (sentiment). This makes it pertinent to understand the accuracy of these automated analyses. While information systems research has produced remarkable leaps of progress, the emphasis has been on innovation rather than evaluation. From an applied perspective, it is not clear whether leaderboard results for selected problems generalize across data sets and domains. In this article, we focus on sentiment analysis methods and assess performance across applications by combining a meta-analysis of 216 comparative computer science publications on 271 unique data sets with experimental evaluations of novel language models. To the best of our knowledge, this constitutes the most comprehensive assessment of sentiment analysis accuracy to date. We find that method choice explains only 10% of the variance in accuracy. Controlling for contextual factors such as data set and paper characteristics increases explanatory power to over 75%, suggesting differences across research problems matter. We find that accuracy of sentiment analysis can indeed approach 95% but can also fall below 50%. This shows that more nuanced benchmarks, rather than best attainable values for selected use cases, are more meaningful for an applied audience. We compute benchmark values that take both methodological choices and application context into account.
Keywords: sentiment analysis, meta-analysis, natural language processing, lexicons, machine learning, transfer learning, language models
Suggested Citation: Suggested Citation