Copyright Law and Generative AI Training - Technological and Legal Foundations
(Urheberrecht und Training generativer KI-Modelle - Technologische und juristische Grundlagen)
Posted: 9 Oct 2024
Date Written: August 29, 2024
Abstract
Generative AI is transforming creative fields by rapidly producing texts, images, music, and videos. These AI creations often seem as impressive as human-made works but require extensive training on vast amounts of data, much of which are copyright protected. This dependency on copyrighted material has sparked legal debates, as AI training involves “copying” and “reproducing” these works, actions that could potentially infringe on copyrights. In defense, AI proponents in the United States invoke “fair use” under Section 107 of the Copyright Act, while in Europe, they cite Article 4(1) of the 2019 DSM Directive, which allows certain uses of copyrighted works for “text and data mining.”
This study challenges the prevailing European legal stance, presenting several arguments:
1. The exception for text and data mining should not apply to generative AI training because the technologies differ fundamentally - one processes semantic information only, while the other also extracts syntactic information.
2. There is no suitable copyright exception or limitation to justify the massive infringements occurring during the training of generative AI. This concerns the copying of protected works during data collection, the full or partial replication inside the AI model, and the reproduction of works from the training data initiated by the end-users of AI systems like ChatGPT.
3. Even if AI training occurs outside Europe, developers cannot fully avoid European copyright laws. If works are replicated inside an AI model, making the model available in Europe could infringe the “right of making available“ under Article 3 of the InfoSoc Directive. Accordingly, offering AI services to European users ultimately subjects developers to European copyright laws and European courts’ jurisdiction.
This study suggests to rethink copyright issues in the context of AI. Given the technical revolution and socio-economic disruptions generative AI brings, lawmakers should reconsider how to balance protection of human creativity with the interest in AI innovation. The current lack of regulation neglects the technical realities and is thus not only legally unsound but also unjust.
Note: Downloadable document is in German.
Keywords: Artificial Intelligence, generative AI, intellectual property, copyright, text and data mining, TDM, Künstliche Intelligenz, generative KI-Modelle, Urheberrecht, TDM-Schranke, Urheberrecht, ChatGPT, Stable Diffusion, DSM Directive
Suggested Citation: Suggested Citation