Big Data Comes for Textualism: The Use and Abuse of Corpus Linguistics in Second Amendment Litigation

141 Pages Posted: 11 Aug 2021

See all articles by Mark W. Smith

Mark W. Smith

University of Oxford - Department of Pharmacology; The King's College; Ave Maria University - Ave Maria School of Law

Dan M. Peterson


Date Written: July 14, 2021


Some scholars, judges, and advocates have recently urged that legal corpus linguistics, a methodology that uses computerized searches of large volumes of texts known as “corpora,” can determine the original meaning of constitutional provisions. More particularly, certain of these advocates have argued that corpus linguistics searches of Founding era corpora prove that the Second Amendment right to keep and bear arms protects only a collective, militia right and not an individual, private right to arms, contrary to the Supreme Court’s interpretation of that amendment in District of Columbia v. Heller, 554 U.S 570 (2008).

In this article, we argue that relying on corpus linguistics to determine the meaning of the Second Amendment suffers from severe conceptual and practical difficulties. One of the most fundamental flaws concerns the central methodological assumption of corpus linguistics—the “frequency hypothesis”—which posits that the most frequent meaning of a word or phrase returned by a corpus search should be the meaning adopted for purposes of constitutional interpretation. Even if the phrase “bear arms” most frequently appears in a military context, that does not mean that the constitutional language excludes an individual right to bear arms for self-defense and other private purposes. Military and militia references were more likely to appear in public discussions of the right to bear arms simply because they were more “newsworthy” than the mundane acts of ordinary people carrying a firearm for hunting or defense, which would rarely be recorded. Contemporary examples, including references by the Founders themselves, show that the right to “bear arms” included protection of an individual right as well as furthering a well-regulated militia.

In addition, corpus linguistics suffers from serious problems concerning the composition of the corpora, which are biased in favor of elite language usage and are critically incomplete, missing some of the key texts that historians and legal scholars have long relied upon in discerning the Second Amendment’s meaning. Use of legal corpus linguistics also raises serious practical difficulties in actual constitutional litigation, including the absence of the usual safeguards applicable to expert or “scientific” evidence.

In the end, the counting of words resulting from a corpus search cannot overcome the history and traditions at the time of the Founding that allowed free carry and use of firearms, and the core conception by the Founders that self-protection with arms is a pre-existing right that cannot be taken away from the individual by any act of civil society.

Keywords: Second Amendment, Corpus Linguistics

Suggested Citation

Smith, Mark W. and Peterson, Dan M., Big Data Comes for Textualism: The Use and Abuse of Corpus Linguistics in Second Amendment Litigation (July 14, 2021). Available at SSRN: or

Mark W. Smith (Contact Author)

University of Oxford - Department of Pharmacology

Mansfield Rd
Oxford OX1 3QT,
United Kingdom

The King's College

56 Broadway
New York, NY 10004
United States

Ave Maria University - Ave Maria School of Law ( email )

1025 Commons Circle
Naples, FL 34119
United States

Dan M. Peterson


Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics