Large Scale Online Text Analysis Using Lydia
35 Pages Posted: 19 Jul 2010 Last revised: 15 Sep 2010
Date Written: 2010
Political scientists have recognized that the news media serve as an important intervening variable between political issues and public opinion. The media can reinforce existing attitudes (Lazarsfeld, Berelson, and Gaudet 1944), increase the salience of issues (Iyengar and Kinder 1987; Romer, Jamieson, and Aday 2003), and set the agenda for public discourse through the priming and framing of issues (Iyengar 1987; Druckman 2001; Krosnick and Kinder 1990). Media coverage can also aid citizens in the evaluation of political candidates (Aldrich and Alvarez 1994; Ansolabehere and Iyengar 1994; Mendelsohn 1996) and influence voting decisions (Iyengar 1990a; Mendelsohn 1996).
The Lydia project is designed to tame the information tide emanating from news media and blogs in the US (and around the world) by building a relational model of people, places, and things through natural language processing (NLP) of news sources and the statistical analysis of entity frequencies and co-locations. In this paper, we will introduce the Lydia system and its method of collecting and classifying entities, describe each phase of the process, and present a practical application of the system for political scientists by examining the value of the Lydia system in predicting presidential approval for different partisan groups across time. Examining presidential approval as a function of presidential media sentiment and controlling for inflation and unemployment, we find media sentiment toward the president has a significant effect on presidential approval for different partisan groups We also find that media sentiment toward the president responds to changes in presidential approval.
Suggested Citation: Suggested Citation