Text Scaling for Open-Ended Survey Responses and Social Media Posts

60 Pages Posted: 30 Sep 2017 Last revised: 9 Aug 2019

Date Written: August 7, 2019


Open-ended survey responses and social media posts contain valuable information about public opinions, but can consist of only a handful of words. This succinctness makes them hard to summarize, especially when the vocabulary size across all respondents is large. Here, we propose a method to characterize and score opinions in these data. The approach scores respondents' opinion justifications based on their use of common words and the words that tend to accompany them, so that we can summarize opinions without relying on rarely used vocabulary. This common word regularization identifies keywords for interpreting text dimensions, and is able to bring in information from pre-trained word embeddings to more reliably estimate low-dimensional attitudes in small samples. We apply the method to open-ended survey responses on the Affordable Care Act and partisan animus, as well as Russian intelligence linked Twitter accounts, to evaluate whether the method produces compact text dimensions. Unlike comparison unsupervised techniques, top dimensions identified by this method are the best predictors of issue attitudes, vote choice, and partisanship. Although the method estimates issue-specific "gist" scales that differ in important ways from ideological scales trained on politicians' texts, multi-dimensionality and extremity on these scales are nonetheless associated with opinion change.

Suggested Citation

Hobbs, William R., Text Scaling for Open-Ended Survey Responses and Social Media Posts (August 7, 2019). Available at SSRN: https://ssrn.com/abstract=3044864 or http://dx.doi.org/10.2139/ssrn.3044864

William R. Hobbs (Contact Author)

Cornell University

Ithaca, NY 14853
United States

Do you want regular updates from SSRN on Twitter?

Paper statistics

Abstract Views
PlumX Metrics