Text Scaling for Open-Ended Survey Responses and Social Media Posts
60 Pages Posted: 30 Sep 2017 Last revised: 9 Aug 2019
Date Written: August 7, 2019
Open-ended survey responses and social media posts contain valuable information about public opinions, but can consist of only a handful of words. This succinctness makes them hard to summarize, especially when the vocabulary size across all respondents is large. Here, we propose a method to characterize and score opinions in these data. The approach scores respondents' opinion justifications based on their use of common words and the words that tend to accompany them, so that we can summarize opinions without relying on rarely used vocabulary. This common word regularization identifies keywords for interpreting text dimensions, and is able to bring in information from pre-trained word embeddings to more reliably estimate low-dimensional attitudes in small samples. We apply the method to open-ended survey responses on the Affordable Care Act and partisan animus, as well as Russian intelligence linked Twitter accounts, to evaluate whether the method produces compact text dimensions. Unlike comparison unsupervised techniques, top dimensions identified by this method are the best predictors of issue attitudes, vote choice, and partisanship. Although the method estimates issue-specific "gist" scales that differ in important ways from ideological scales trained on politicians' texts, multi-dimensionality and extremity on these scales are nonetheless associated with opinion change.
Suggested Citation: Suggested Citation