A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality

6 Pages Posted: 8 Jan 2021

See all articles by Shubhanshu Mishra

Shubhanshu Mishra

University of Illinois at Urbana-Champaign

Daniel Collier

University of Memphis

Date Written: December 30, 2020

Abstract

In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment.

Keywords: Higher Education, Tuition Free College, Machine Learning, Bag of Words

JEL Classification: C67, C88, I23, I24, I25, I28

Suggested Citation

Mishra, Shubhanshu and Collier, Daniel, A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality (December 30, 2020). Available at SSRN: https://ssrn.com/abstract=3757554 or http://dx.doi.org/10.2139/ssrn.3757554

Shubhanshu Mishra (Contact Author)

University of Illinois at Urbana-Champaign ( email )

601 E John St
Champaign, IL Champaign 61820
United States

Daniel Collier

University of Memphis ( email )

Memphis, TN 38152
Memphis, TN usa 38152-3370
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
26
Abstract Views
390
PlumX Metrics