The New Legal Landscape for Text Mining and Machine Learning

Journal of the Copyright Society of the USA, Vol 66 (2019)

64 Pages Posted: 26 Feb 2019 Last revised: 1 Apr 2019

See all articles by Matthew Sag

Matthew Sag

Loyola University Chicago School of Law

Date Written: February 9, 2019

Abstract

Individually and collectively, copyrighted works have the potential to generate information that goes far beyond what their individual authors expressed or intended. Various methods of computational and statistical analysis of text — usually referred to as text data mining (“TDM”) or just text mining — can unlock that information. However, because almost every use of TDM involves making copies of the text to be mined, the legality of that copying has become a fraught issue in copyright law in United States and around the world. One of the most fundamental questions for copyright law in the Internet age is whether the protection of the author’s original expression should stand as an obstacle to the generation of insights about that expression. How this question is answered will have a profound influence on the future of research across the sciences and the humanities, and for the development of the next generation of information technology: machine learning and artificial intelligence.

This Article consolidates a theory of copyright law should that I have advanced in a series of articles and amicus briefs over the past decade. It explains why applying copyright’s fundamental principles in the context of new technologies necessarily implies that copying expressive works for non-expressive purposes should not be counted as infringement and must be recognized as fair use. The Article shows how that theory was adopted and applied in the recent high-profile test cases, Authors Guild v. HathiTrust and Authors Guild v. Google, and takes stock of the legal context for TDM research in the United States in the aftermath of those decisions.

The Article makes important contributions to copyright theory, but is also integrates that theory with a practical assessment various interrelated legal issues that text mining researchers and their supporting institutions must confront if they are to realize the full potential of these technologies. These issues range from the enforceability of website terms of service, the effect of laws prohibiting computer hacking and the circumvention of technological protection measures (i.e., encryption and other digital locks), and cross-border copyright issues.

Keywords: Copyright, Fair Use, Terms of Use, Computer Hacking, Digital Rights Management, Computer Fraud and Abuse Act, Digital Single Market Directive, Text Mining, Text Data Mining, Digital Humanities, Machine Learning, Artificial Intelligence, Internet Search, Reverse Engineering, Plagiarism Detection

JEL Classification: K00, C88

Suggested Citation

Sag, Matthew, The New Legal Landscape for Text Mining and Machine Learning (February 9, 2019). Journal of the Copyright Society of the USA, Vol 66 (2019). Available at SSRN: https://ssrn.com/abstract=3331606 or http://dx.doi.org/10.2139/ssrn.3331606

Matthew Sag (Contact Author)

Loyola University Chicago School of Law ( email )

25 E. Pearson
Chicago, IL 60611
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
287
Abstract Views
1,833
rank
104,419
PlumX Metrics