The Heart of the Matter: Copyright, AI Training, and LLMs

29 Pages Posted: 1 Nov 2024 Last revised: 1 Nov 2024

See all articles by Daniel J. Gervais

Daniel J. Gervais

Vanderbilt University - Law School

Noam Shemtov

Queen Mary University of London, Centre for Commercial Law Studies

Haralambos Marmanis

Copyright Clearance Center

Catherine Zaller Rowland

Copyright Clearance Center

Date Written: September 21, 2024

Abstract

This article explores the intricate intersection of copyright law and large language models (LLMs), a cutting-edge artificial intelligence technology that has rapidly gained prominence. The authors provide a comprehensive analysis of the copyright implications arising from the training, fine-tuning, and use of LLMs, which often involve the ingestion of vast amounts of copyrighted material. The paper begins by elucidating the technical aspects of LLMs, including tokenization, word embeddings, and the various stages of LLM development. This technical foundation is crucial for understanding the subsequent legal analysis. The authors then delve into the copyright law aspects, examining potential infringement issues related to both inputs and outputs of LLMs. A comparative legal analysis is presented, focusing on the United States, European Union, United Kingdom, Japan, Singapore, and Switzerland. The article scrutinizes relevant copyright exceptions and limitations in these jurisdictions, including fair use in the US and text and data mining exceptions in the EU. The authors highlight the uncertainties and challenges in applying these legal concepts to LLMs, particularly in light of recent court decisions and legislative developments. The paper also addresses the potential impact of the EU's AI Act on copyright considerations, including its extraterritorial effects. Furthermore, it explores the concept of "making available" in the context of LLMs and its implications for copyright infringement. Recognizing the legal uncertainties and the need for a balanced approach that fosters both innovation and copyright protection, the authors propose licensing as a key solution. They advocate for a combination of direct and collective licensing models to provide a practical framework for the responsible use of copyrighted materials in AI systems.

This article offers valuable insights for legal scholars, policymakers, and industry professionals grappling with the copyright challenges posed by LLMs. It contributes to the ongoing dialogue on adapting copyright law to technological advancements while maintaining its fundamental purpose of incentivizing creativity and innovation.

Suggested Citation

Gervais, Daniel J. and Shemtov, Noam and Marmanis, Haralambos and Zaller Rowland, Catherine, The Heart of the Matter: Copyright, AI Training, and LLMs (September 21, 2024). Available at SSRN: https://ssrn.com/abstract=4963711 or http://dx.doi.org/10.2139/ssrn.4963711

Daniel J. Gervais (Contact Author)

Vanderbilt University - Law School ( email )

131 21st Avenue South
Nashville, TN 37203-1181
United States
615 322 2615 (Phone)

Noam Shemtov

Queen Mary University of London, Centre for Commercial Law Studies ( email )

Charterhouse Square
London, WC2A 3JB
United Kingdom

Haralambos Marmanis

Copyright Clearance Center ( email )

Catherine Zaller Rowland

Copyright Clearance Center ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
2,499
Abstract Views
6,848
Rank
11,664
PlumX Metrics