Recounting the Courts? Applying Automated Content Analysis to Enhance Empirical Legal Research
25 Pages Posted: 6 Jul 2006
Date Written: August 28, 2006
Political scientists in general and public law specialists in particular have only recently begun to exploit text classification using machine learning techniques to enable the reliable and detailed content analysis of political/legal documents on a large scale. This paper provides an overview and assessment of this methodology. We describe the basics of text classification, suggest applications of this technique to enhance empirical legal research (and political science more broadly), and report results of experiments designed to test the strengths and weaknesses of alternative text classification models for classifying the positions and interpreting the content of briefs submitted to the U.S. Supreme Court. We find that the Wordscores method (introduced by Laver, Benoit, et. al. 2003), and various models using a Naïve Bayes classifier, perform well at accurately classifying the ideological direction of amicus curiae briefs submitted in the Bakke (1978) and Bollinger (2003) affirmative action cases. We also find that automated feature selection techniques can enable the detection of disparate issue conceptualizations by opposing sides in a single case, and facilitate analysis of relative linguistic "reliance" and "dominance" over time. We conclude by discussing the implications of our results and pointing to areas where technical and infrastructural improvement are most needed.
Keywords: computational linguistics, machine learning, content analysis, amicus curiae, legal rhetoric
Suggested Citation: Suggested Citation