Data-Driven Originalism

79 Pages Posted: 14 Sep 2017 Last revised: 31 Mar 2019

See all articles by Thomas R. Lee

Thomas R. Lee

Brigham Young University - J. Reuben Clark Law School

James Cleith Phillips

Brigham Young University

Date Written: January 27, 2018


The threshold question for all originalist methodologies concerns the original communicative content of the words of the Constitution. For too long this inquiry has been pursued through tools that are ill-suited to the task. Dictionaries generally just define individual words; they don’t typically define phrases or allow for the consideration of broader linguistic context. And while dictionaries can provide a list of possible senses, they can’t tell us which sense is the most ordinary (or common). Founding-era dictionaries, moreover, were generally the work of one individual, tended to plagiarize each other, and relied on famous, often dated examples of English usage (from Shakespeare or the King James Bible).

Originalists have also turned to examples of usage in founding-era documents. This approach can address some of the shortcomings of dictionaries; a careful inquiry into sample sentences from founding-era literature can consider a wide range of semantic context. Yet even here the standard inquiry falls short. Originalists tend to turn only to certain sources, such as the Federalist Papers or the records of the state constitutional conventions, and those sources may not fully reflect how ordinary users of English of the day would have understood the Constitution (or at least have used language). Second, the number of founding-era documents relied on is often rather small, especially for generalizing about an entire country (or profession, in the case of lawyers). This opens originalists up to criticisms of cherry-picking, and even if that is not the case, sample sizes are just too small to confidently answer originalist questions.

But all is not lost. Big data, and the tools of linguists, have the potential to bring greater rigor and transparency to the practice of originalism. This article will explore the application of corpus linguistic methodology to aid originalism’s inquiry into the original communicative content of the Constitution. We propose to improve this inquiry by use of a newly released corpus (or database) of founding-era texts: the beta version of the Corpus of Founding-Era American English. The initial beta version will contain approximately 150 million words, derived from the Evans Early American Imprint Series (books, pamphlets and broadsides by all types of Americans on all types of subjects), the National Archives Founders Online Project (the papers of Washington, Franklin, Adams, Jefferson, Madison, and Hamilton, including correspondence to them), and Hein Online’s Legal Database (cases, statutes, legislative debates, etc.).

The paper will showcase how typical tools of a corpus—concordance lines, collocation, clusters (or n-grams), and frequency data—can aid in the search for original communicative content. We will also show how corpus data can help determine whether a word or phrase in question is best thought of as an ordinary one or a legal term of art. To showcase corpus linguistic methodology, the paper will analyze important clauses in the Constitution that have generated litigation and controversy over the years (commerce, public use, and natural born) and another whose original meaning has been presumed to be clear (domestic violence). We propose best practices, and also discuss the limitations of corpus linguistic methodology for originalism.

Larry Solum has predicted that “corpus linguistics will revolutionize statutory and constitutional interpretation.”* Our paper seeks to chart out the first steps of that revolution so that others may follow.

Keywords: Originalism, Constitutional Interpretation, Corpus Linguistics, Dictionaries, public use, commerce, natural born, domestic violence

Suggested Citation

Lee, Thomas R. and Phillips, James Cleith, Data-Driven Originalism (January 27, 2018). University of Pennsylvania Law Review, Forthcoming, Available at SSRN: or

Thomas R. Lee

Brigham Young University - J. Reuben Clark Law School ( email )

519 JRCB
Brigham Young University
Provo, UT 84602
United States

James Cleith Phillips (Contact Author)

Brigham Young University ( email )

Provo, UT 84602
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics