The Training of Generative AI Is Not Text and Data Mining

European Intellectual Property Review (E.I.P.R.), forthcoming 2/2025

28 Pages Posted: 19 Dec 2024 Last revised: 20 Dec 2024

See all articles by Tim W. Dornis

Tim W. Dornis

Leibniz University Hannover; New York University School of Law

Date Written: October 19, 2024

Abstract

The creative capacities of generative artificial intelligence (AI) systems can be attributed to an extensive training of the underlying models. This training utilizes massive amounts of data, most of which are protected by copyright. While the discussion in the US is conducted in light of the fair use defence, AI developers in Europe refer to the exceptions for text and data mining under the DSM Directive 2019/790. However, a closer look at the technological foundations of generative AI training reveals that the text and data mining exception does not apply. The training of generative AI models without licences for the works used as training data is therefore copyright infringement.

Keywords: AI, Artificial Intelligence, copyright, Text and Data Mining, TDM exception, generative AI models, DSM Directive, AI Act

Suggested Citation

Dornis, Tim W., The Training of Generative AI Is Not Text and Data Mining (October 19, 2024). European Intellectual Property Review (E.I.P.R.), forthcoming 2/2025, Available at SSRN: https://ssrn.com/abstract=4993782 or http://dx.doi.org/10.2139/ssrn.4993782

Tim W. Dornis (Contact Author)

Leibniz University Hannover

Königsworther Platz 1
Hannover, 30167
Germany

New York University School of Law ( email )

40 Washington Square South
New York, NY 10012-1099
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
2,509
Abstract Views
8,969
Rank
14,480
PlumX Metrics