Strength in Numbers: Using Big Data to Simplify Sentiment Classification

28 Pages Posted: 8 Mar 2017  

Apostolos Filippas

New York University (NYU) - Department of Information, Operations, and Management Sciences

Theodoros Lappas

Stevens Institute of Technology - School of Business

Date Written: March 7, 2017

Abstract

Sentiment classification, the task of assigning a positive or negative label to a text segment, is a key component of mainstream applications such as reputation monitoring, sentiment summarization, and item recommendation. Even though the performance of sentiment classification methods has steadily improved over time, their ever-increasing complexity renders them comprehensible by only a shrinking minority of expert practitioners. For all others, such highly complex methods are black-box predictors that are hard to tune and even harder to justify to decision-makers. Motivated by these shortcomings, we introduce BigCounter: a new algorithm for sentiment classification that substitutes algorithmic complexity with Big Data. Our algorithm combines standard data structures with statistical testing to deliver accurate and interpretable predictions. It is also parameter-free and suitable for use virtually "out of the box", which makes it appealing for organizations wanting to leverage their troves of unstructured data without incurring the significant expense of creating in-house teams of data scientists. Finally, BigCounter’s efficient and parallelizable design makes it applicable to very large datasets. We apply our method on such datasets toward a study on the limits of Big Data for sentiment classification. Our study finds that, after a certain point, predictive performance tends to converge and additional data have little benefit. Our algorithmic design and findings provide the foundations for future research on the data-over-computation paradigm for classification problems.

Keywords: Big Data, Sentiment Classification, Opinion Mining, Analytics, Text Analytics

Suggested Citation

Filippas, Apostolos and Lappas, Theodoros, Strength in Numbers: Using Big Data to Simplify Sentiment Classification (March 7, 2017). Available at SSRN: https://ssrn.com/abstract=2929091 or http://dx.doi.org/10.2139/ssrn.2929091

Apostolos Filippas

New York University (NYU) - Department of Information, Operations, and Management Sciences ( email )

44 West Fourth Street
New York, NY 10012
United States

Theodoros Lappas (Contact Author)

Stevens Institute of Technology - School of Business ( email )

Hoboken, NJ 07030
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
95
rank
248,788
Abstract Views
287
PlumX