Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google

41 Pages Posted: 16 Oct 2012

See all articles by Matthew Jockers

Matthew Jockers

Washington State University

Matthew Sag

Emory University School of Law

Jason Schultz

New York University School of Law

Date Written: August 3, 2012


How Copyright Law Could Make or Break the Future for Digital Humanities.

This case raises many legal, technical, and epistemological issues related to the future of higher education, research, and scholarship – especially those efforts that seek to take advantage of “big data” analytics and methodologies. Advances in computer technology and the availability of digital texts will allow scholars of the humanities a chance to do what biologists, physicists and economists have been doing for decades – analyze massive amounts of data. Large-scale quantitative projects like those being undertaken at the Stanford Literary Lab are unearthing previously unknowable information about individual works, and entire genres of literature.

Researchers working in Information Retrieval frequently use text mining and computer-aided classification to identify and retrieve relevant documents. Using similar techniques, researchers in the Digital Humanities are able to identify and retrieve relevant texts, often from unlikely places. Humanities researchers can thereby expand their traditional study of a few canonical works to a study of any one of the several million books in the larger archive of literary history - an archive that has hitherto remained hidden because of the limitations of humans’ reading capacity.

In this amicus brief scholars from disciplines including law, computer science, linguistics, history and literature ask the court to consider the impact on this vital area of research when ruling on the legality of mass digitization. Specifically, the brief addresses whether United States copyright law should stand as an obstacle to statistical and computational analysis of the millions of books owned by the nation’s great university libraries.

The brief argues that, just as copyright law has long recognized the distinction between protection for an author’s original expression (e.g., the narrative prose describing the plot) and the public’s right to access the facts and ideas contained within that expression (e.g., a list of characters or the places they visit), the law must also recognize the distinction between copying books for expressive purposes (e.g., reading) and nonexpressive purposes, such as extracting metadata and conducting macroanalyses. We amici urge the court to follow established precedent with respect to Internet search engines, software reverse engineering, and plagiarism detection software and to hold that the digitization of books for text-mining purposes is a form of incidental or intermediate copying to be regarded as fair use as long as the end product is also nonexpressive or otherwise non-infringing.

This brief updates the brief filed in Authors Guild v. HathiTrust.

Keywords: Copyright, text-mining, digital humanities, nonexpressive use, hathitrust

JEL Classification: K00

Suggested Citation

Jockers, Matthew and Sag, Matthew and Schultz, Jason, Brief of Digital Humanities and Law Scholars as Amici Curiae in Authors Guild v. Google (August 3, 2012). Available at SSRN: or

Matthew Jockers

Washington State University ( email )

Pullman, WA 99164
United States

Matthew Sag (Contact Author)

Emory University School of Law ( email )

1301 Clifton Road
Atlanta, GA 30322
United States

Jason Schultz

New York University School of Law ( email )

40 Washington Square South
New York, NY 10012-1099
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics