A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality
6 Pages Posted: 8 Jan 2021
Date Written: December 30, 2020
Abstract
In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment.
Keywords: Higher Education, Tuition Free College, Machine Learning, Bag of Words
JEL Classification: C67, C88, I23, I24, I25, I28
Suggested Citation: Suggested Citation