Automated Detection of Chinese Government Astroturfers Using Network and Social Metadata
35 Pages Posted: 28 Feb 2016 Last revised: 31 Jan 2018
Date Written: April 21, 2016
Astroturfing is the practice of an organization communicating a message using fake “grass-roots” sources. Because these messages attempt to mimic ordinary individuals, distinguishing them from real grass-roots messages is a difficult task. In this paper, I present a method for automatically detecting pro-government astroturfers in China (colloquially referred to as the Fifty Cent Party), using comment metadata from a dataset of 70 million news media comments posted on 6 million news articles from 19 popular news websites in China. I estimate that approximately 15% of all comments made on these 19 news websites are made by government astroturfers. This method of comment propaganda detection is automated, and does not require manual human labeling. Instead, data are labeled according to metadata characteristic of the work procedures and behavioral patterns of government astroturfers. Models trained on these metadata predict posts from a leaked dataset of government astroturfers with as high as 94.1% accuracy. This method allows researchers timely access to government astroturfer commentary from China. Additionally, this method allows for prediction of astroturfers’ bureaucratic affiliation using social network data, and can allow researchers to explore variance in how this information control tactic is deployed in different bureaucracies and localities in China. It also suggests a forensic method for detecting astroturfers in different countries and online platforms.
Keywords: propaganda, China, authoritarian politics, media, political communication, 50 cent party, NLP, machine learning, natural language processing, text analysis, automated text analysis
Suggested Citation: Suggested Citation