Foundation Models and Fair Use

61 Pages Posted: 13 Apr 2023

See all articles by Peter Henderson

Peter Henderson

Stanford University - Stanford University, Students

Xuechen Li

Stanford University - Stanford University, Students

Dan Jurafsky

Stanford University - Stanford University, Students

Tatsunori Hashimoto

Stanford University - Stanford University, Students

Mark A. Lemley

Stanford Law School

Percy Liang

Stanford University - Department of Computer Science

Date Written: March 27, 2023

Abstract

Existing foundation models are trained on copyrighted material. Deploying these models can pose both legal and ethical risks when data creators fail to receive appropriate attribution or compensation. In the United States and several other countries, copyrighted content may be used to build foundation models without incurring liability due to the fair use doctrine. However, there is a caveat: If the model produces output that is similar to copyrighted data, particularly in scenarios that affect the market of that data, fair use may no longer apply to the output of the model. In this work, we emphasize that fair use is not guaranteed, and additional work may be necessary to keep model development and deployment squarely in the realm of fair use. First, we survey the potential risks of developing and deploying foundation models based on copyrighted content. We review relevant U.S. case law, drawing parallels to existing and potential applications for generating text, source code, and visual art. Experiments confirm that popular foundation models can generate content considerably similar to copyrighted material. Second, we discuss technical mitigations that can help foundation models stay in line with fair use. We argue that more research is needed to align mitigation strategies with the current state of the law. Lastly, we suggest that the law and technical mitigations should co-evolve. For example, coupled with other policy mechanisms, the law could more explicitly consider safe harbors when strong technical tools are used to mitigate infringement harms. This co-evolution may help strike a balance between intellectual property and innovation, which speaks to the original goal of fair use. But we emphasize that the strategies we describe here are not a panacea and more work is needed to develop policies that address the potential harms of foundation models.

Suggested Citation

Henderson, Peter and Li, Xuechen and Jurafsky, Dan and Hashimoto, Tatsunori and Lemley, Mark A. and Liang, Percy, Foundation Models and Fair Use (March 27, 2023). Available at SSRN: https://ssrn.com/abstract=4404340 or http://dx.doi.org/10.2139/ssrn.4404340

Peter Henderson

Stanford University - Stanford University, Students ( email )

Stanford, CA
United States

Xuechen Li

Stanford University - Stanford University, Students ( email )

Dan Jurafsky

Stanford University - Stanford University, Students ( email )

Tatsunori Hashimoto

Stanford University - Stanford University, Students ( email )

Mark A. Lemley (Contact Author)

Stanford Law School ( email )

559 Nathan Abbott Way
Stanford, CA 94305-8610
United States

Percy Liang

Stanford University - Department of Computer Science ( email )

Gates Computer Science Building
353 Serra Mall
Stanford, CA 94305-9025
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
97
Abstract Views
297
Rank
427,711
PlumX Metrics