Corpus Linguistics in Legal Interpretation: When Is It (In)appropriate?

53 Pages Posted: 25 Feb 2019

See all articles by Neal Goldfarb

Neal Goldfarb

Georgetown University Law Center

Date Written: February 5, 2019


Corpus linguistics can be a powerful tool in legal interpretation, but like all tools, it is suited for some uses but not for others. At a minimum, that means that there are likely to be cases in which corpus data doesn’t yield any useful insights. More seriously, in some cases where the data seems useful, that appearance might prove on closer examination to be misleading. So it is important for people to be able to distinguish issues as to which corpus results are genuinely useful from those in which they are not. A big part of the motivation behind introducing corpus linguistics into legal interpretation is to increase the sophistication and quality of interpretive analysis. That purpose will be disserved corpus data is cited in support of conclusions that the data doesn’t really support.

This paper is an initial attempt to deal with problem of distinguishing uses of corpus linguistics that can yield useful data from those that cannot. In particular, the paper addresses a criticism that has been made of the use of corpus linguistics in legal interpretation — namely, that that the hypothesis underlying the legal-interpretive use of frequency data is flawed. That hypothesis, ac-cording to one of the critics, is that “where an ambiguous term retains two plausible meanings, the ordinary meaning of the term... is the more frequently used meaning[.]” (Although that description is not fully accurate, it will suffice for present purposes.)

The asserted flaw in this hypothesis is that differences in the frequencies of different senses of a word might be due to “reasons that have little to do with the ordinary meaning of that word.” Such differences, rather than reflecting the “sense of a word or phrase that is most likely implicated in a given linguistic context,” might instead reflect at least in part “the prevalence or newsworthiness of the underlying phenomenon that the term denotes.” That argument is referred to in this paper as the Purple-Car Argument, based on a skeptical comment about the use of corpus linguistics in legal interpretation: “If the word ‘car’ is ten times more likely to co-occur with the word ‘red’ than with the word ‘purple,’ it would be ludicrous to conclude from this data that a purple car is not a ‘car.’”

This paper deals with the Purple-Car Argument in two ways. First, it attempts to clarify the argument’s by showing that there are ways of using corpus linguistics that do not involve frequency analysis and that are therefore not even arguably subject to the Purple-Car Argument. The paper offers several case studies illustrating such uses.

Second, the acknowledges that when frequency analysis is in fact used, there will be cases that do implicate the flaw that the Purple-Car Argument identifies. The problem, therefore, is to figure out how to distinguish these Purple-Car cases from cases in which the Purple-Car Argument does not apply. The paper discusses some possible methodologies that might be helpful in making that determination. It then presents three case studies, focusing on cases that are well known to those familiar with the law-and-corpus-linguistics literature: Muscarello v. United States, State v. Rasabout, and People v. Harris. The paper concludes that the Purple-Car Argument does not apply to Muscarello, that it does apply to Rasabout, and that a variant of the argument applies to the dissenting opinion in Harris.

Keywords: legal interpretation, corpus lingistics

Suggested Citation

Goldfarb, Neal, Corpus Linguistics in Legal Interpretation: When Is It (In)appropriate? (February 5, 2019). Available at SSRN:

Neal Goldfarb (Contact Author)

Georgetown University Law Center ( email )

Washington, DC


Register to save articles to
your library


Paper statistics

Abstract Views
PlumX Metrics