A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing

Liu, X., Singh, P. V., & Srinivasan, K. (2016). A structured analysis of unstructured big data by leveraging cloud computing. Marketing Science, 35(3), 363-388.

37 Pages Posted: 20 Jul 2020 Last revised: 8 Sep 2020

See all articles by Xiao Liu

Xiao Liu

New York University (NYU) - Leonard N. Stern School of Business

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business

Kannan Srinivasan

Carnegie Mellon University

Date Written: September 30, 2015

Abstract

Accurate forecasting of sales/consumption is particularly important for marketing because this information can be used to adjust marketing budget allocations and overall marketing strategies. Recently, online social platforms have produced an unparalleled amount of data on consumer behavior. However, two challenges have limited the use of these data in obtaining meaningful business marketing insights. First, the data are typically in an unstructured format, such as texts, images, audio, and video. Second, the sheer volume of the data makes standard analysis procedures computationally unworkable. In this study, we combine methods from cloud computing, machine learning, and text mining to illustrate how online platform content, such as Twitter, can be effectively used for forecasting. We conduct our analysis on a significant volume of nearly two billion Tweets and 400 billion Wikipedia pages. Our main findings emphasize that, by contrast to basic surface-level measures such as the volume of or sentiments in Tweets, the information content of Tweets and their timeliness significantly improve forecasting accuracy. Our method endogenously summarizes the information in Tweets. The advantage of our method is that the classification of the Tweets is based on what is in the Tweets rather than preconceived topics that may not be relevant. We also find that, by contrast to Twitter, other online data (e.g., Google Trends, Wikipedia views, IMDB reviews, and Huffington Post news) are very weak predictors of TV show demand because users tweet about TV shows before, during, and after a TV show, whereas Google searches, Wikipedia views, IMDB reviews, and news posts typically lag behind the show.

Keywords: big data; cloud computing; text mining; user generated content; Twitter; Google Trends

Suggested Citation

Liu, Xiao and Singh, Param Vir and Srinivasan, Kannan, A Structured Analysis of Unstructured Big Data by Leveraging Cloud Computing (September 30, 2015). Liu, X., Singh, P. V., & Srinivasan, K. (2016). A structured analysis of unstructured big data by leveraging cloud computing. Marketing Science, 35(3), 363-388., Available at SSRN: https://ssrn.com/abstract=3635140

Xiao Liu (Contact Author)

New York University (NYU) - Leonard N. Stern School of Business ( email )

Suite 9-160
New York, NY
United States

Param Vir Singh

Carnegie Mellon University - David A. Tepper School of Business ( email )

5000 Forbes Avenue
Pittsburgh, PA 15213-3890
United States
412-268-3585 (Phone)

Kannan Srinivasan

Carnegie Mellon University ( email )

Pittsburgh, PA 15213-3890
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
82
Abstract Views
417
rank
428,036
PlumX Metrics