Attribute Sentiment Scoring with Online Text Reviews: Accounting for Language Structure and Attribute Self-Selection
55 Pages Posted: 29 May 2019 Last revised: 12 Jun 2019
Date Written: May 27, 2019
The authors address two novel and signiﬁcant challenges in using online text reviews to obtain attribute level ratings. First, they introduce the problem of inferring attribute level sentiment from text data to the marketing literature and develop a deep learning model to address it. While extant bag of words based topic models are fairly good at attribute discovery based on frequency of word or phrase occurrences, associating sentiments to attributes requires exploiting the spatial and sequential structure of language. Second, they illustrate how to correct for attribute self-selection—reviewers choose the subset of attributes to write about—in metrics of attribute level restaurant performance. Using Yelp.com reviews for empirical illustration, they ﬁnd that a hybrid deep learning (CNN-LSTM) model, where CNN and LSTM exploit the spatial and sequential structure of language respectively provide the best performance in accuracy, training speed and training data size requirements. The model does particularly well on the “hard” sentiment classiﬁcation problems. Further, accounting for attribute self-selection signiﬁcantly impacts sentiment scores, especially on attributes that are frequently missing.
Keywords: Text mining, Natural language processing (NLP), Convolutional neural networks (CNN), Long-short term memory (LSTM) Networks, Deep learning, Lexicons, Endogeneity, Self-selection, Online reviews, Online ratings, Customer satisfaction
JEL Classification: M1, M3, C8, C5
Suggested Citation: Suggested Citation