Exploring Performance of Clustering Methods on Document Sentiment Analysis

Journal of Information Science, 2017, 43(1): 54-74

21 Pages Posted: 1 Feb 2016 Last revised: 10 Feb 2017

See all articles by Baojun Ma

Baojun Ma

School of Business and Management, Shanghai International Studies University

Hua Yuan

University of Electronic Science and Technology of China (UESTC) - School of Economics and Management

Ye Wu

Beijing University of Posts and Telecommunications (BUPT) - School of Economics and Management

Date Written: December 3, 2015

Abstract

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.

Keywords: Clustering, data pre-processing, sentiment analysis, term weighting model

Suggested Citation

Ma, Baojun and Yuan, Hua and Wu, Ye, Exploring Performance of Clustering Methods on Document Sentiment Analysis (December 3, 2015). Journal of Information Science, 2017, 43(1): 54-74, Available at SSRN: https://ssrn.com/abstract=2725158

Baojun Ma (Contact Author)

School of Business and Management, Shanghai International Studies University ( email )

1550 Wen Xiang Rd.
Songjiang District
Shanghai, Shanghai 201620
China

HOME PAGE: http://baojunma.com/index_en.html

Hua Yuan

University of Electronic Science and Technology of China (UESTC) - School of Economics and Management ( email )

No. 4 Section 2
North Jianshe Road
Chengdu, Si Chuan 610054
China

Ye Wu

Beijing University of Posts and Telecommunications (BUPT) - School of Economics and Management ( email )

10 Xi Tu Cheng Rd.
Mailbox 164
Beijing, Beijing 100876
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
75
Abstract Views
617
Rank
607,240
PlumX Metrics