Training Is Everything: Artificial Intelligence, Copyright, and Fair Training

Dickinson Law Review, Forthcoming

22 Pages Posted: 9 May 2023

See all articles by Andrew W. Torrance

Andrew W. Torrance

University of Kansas School of Law; MIT Sloan School of Management

Bill Tomlinson

University of California, Irvine; Victoria University of Wellington - Te Herenga Waka

Date Written: May 4, 2023

Abstract

Artificial intelligence (“AI”) leapt into the public consciousness in 2022. It did so not because of a popular Hollywood movie, like The Terminator, or the extravagant claim of a company or pundit. Rather, it earned this newfound attention from the public due to its sudden usefulness and practicality. In quick succession, OpenAI, a software company based in San Francisco, released a graphics generator (that is, DALL-E2), a text generator (that is, GPT3.5), and then a chatbot (that is, ChatGPT) capable of carrying on compelling conversations with humans with no formal computer science training. Other companies, such as Stability AI and Discord, contributed to the ready availability of AI tools easy enough for many people to use. After decades of hype, AI finally achieved its first milestone of democratization.

However, there is a sine qua non lurking behind these democratized sources of AI that has triggered a substantial legal response. To learn how to behave, the current revolutionary generation of AIs must be trained on vast quantities of published images, written works, and sounds, many of which fall within the core subject matter of copyright law. To some, the use of copyrighted works as training sets for AI is merely a transitory and non-consumptive use that does not materially interfere with owners’ content or copyrights protecting it. Companies that use such content to train their AI engine often believe such usage should be considered “fair use” under United States law (sometimes known as “fair dealing” in other countries). By contrast, many copyright owners, as well as their supporters, consider the incorporation of copyrighted works into training sets for AI to constitute misappropriation of owners’ intellectual property, and, thus, decidedly not fair use under the law. This debate is vital to the future trajectory of AI and its applications.

In this article, we analyze the arguments in favor of, and against, viewing the use of copyrighted works in training sets for AI as fair use. We call this form of fair use “fair training”. We identify both strong and spurious arguments on both sides of this debate. In addition, we attempt to take a broader perspective, weighing the societal costs (e.g., replacement of certain forms of human employment) and benefits (e.g., the possibility of novel AI-based approaches to global issues such as environmental disruption) of allowing AI to make easy use of copyrighted works as training sets to facilitate the development, improvement, adoption, and diffusion of AI. Finally, we suggest that the debate over AI and copyrighted works may be a tempest in a teapot when placed in the wider context of massive societal challenges such as poverty, equality, climate change, and loss of biodiversity, to which AI may be part of the solution.

Keywords: law, artificial intelligence, fair use, fair training

Suggested Citation

Torrance, Andrew W. and Tomlinson, Bill, Training Is Everything: Artificial Intelligence, Copyright, and Fair Training (May 4, 2023). Dickinson Law Review, Forthcoming, Available at SSRN: https://ssrn.com/abstract=4437680

Andrew W. Torrance

University of Kansas School of Law ( email )

Green Hall
1535 W. 15th Street
Lawrence, KS 66045-7577
United States

MIT Sloan School of Management ( email )

100 Main Street
Cambridge, MA 02142
United States

Bill Tomlinson (Contact Author)

University of California, Irvine ( email )

Bren Hall
Irvine, CA 92697-3440
United States

Victoria University of Wellington - Te Herenga Waka ( email )

P.O. Box 600
Wellington, 6140
New Zealand

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
923
Abstract Views
3,262
Rank
64,256
PlumX Metrics