A Comprehensive Methodology for Extracting Signal from Social Media Text Using Natural Language Processing and Machine Learning

Proceedings of the Workshop of Information Technology and Systems, Dallas 2015

16 Pages Posted: 7 Jun 2020

See all articles by Wenli Zhang

Wenli Zhang

University of Arizona

Sudha Ram

University of Arizona - Department of Management Information Systems

Date Written: 2015

Abstract

There has been increasing interest in using data from social media, search engines, and other web sources for predictive analytics in many different domains. Although using these datasets in different context has shown significant promise, mounting evidence suggests that many of the results being produced could be misrepresented because of the loosely structured textual data and noise caused by anomalous media spikes and use of misleading terms and phases. We introduce a novel and efficient framework combining natural language processing (NLP) and machine learning classification techniques to extract signal from social media text. Our methodology was tested using two different large real world datasets from social media and resulted in an overall accuracy of 88% and high per-class precision and recall. The methodology described in this paper can be used for a variety of purposes to yield improved analyses of social media and web text with a view to enabling improved predictions.

Suggested Citation

Zhang, Wenli and Ram, Sudha, A Comprehensive Methodology for Extracting Signal from Social Media Text Using Natural Language Processing and Machine Learning (2015). Proceedings of the Workshop of Information Technology and Systems, Dallas 2015, Available at SSRN: https://ssrn.com/abstract=2880834 or http://dx.doi.org/10.2139/ssrn.2880834

Wenli Zhang (Contact Author)

University of Arizona ( email )

Department of History
Tucson, AZ 85721
United States

Sudha Ram

University of Arizona - Department of Management Information Systems ( email )

McClelland Hall
Tucson, AZ 85721-0108
United States
520-621-4113 (Phone)

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
115
Abstract Views
532
rank
264,670
PlumX Metrics