Copyright Regenerated: Harnessing GenAI to Measure Originality and Copyright Scope
64 Pages Posted: 5 Aug 2023 Last revised: 6 Nov 2023
Date Written: August 3, 2023
Abstract
The rise of Generative Artificial Intelligence (GenAI) models is revolutionizing the creative domain. By using models like Gitbub Copilot, Open AI GPT, Stable Diffusion, Midjourney, or DeviantArt, non-professional users can generate high-quality content such as text, images, music, or code. These powerful tools facilitate new unimaginable ways of human creativity on a large scale, disrupting the professional creative sectors. This article proposes a novel approach that leverages the capacity of GenAI to assist in copyright legal disputes.
GenAI models are trained on examples, generalizing expressive patterns and applying these learnings to perform different tasks, such as auto-completing sentences or generating visual outputs in response to a textual prompt. These models are designed to grasp complex probability distributions from training samples by identifying recurring relationships between input and output data.
Similarly, humans learn from a corpus of preexisting materials, memorize impressions, learn styles, extract themes from text, generalize principles from new materials, and engage in deconstructing and reconstructing processes. Unlike human learning, which occurs within the confines of the human mind, GenAI learning involves digital replication. Consequently, GenAI has sparked numerous class actions alleging copyright infringement. These claims assert that the models in-fringe copyright, either because they are trained on copyrighted materials without authorization, generate derivative works of those materials or both.
While copyright law prohibits the unauthorized copying of protected expressions, it permits the extrapolation and learning of ideas. For a work to be copyrighted, it must be original, meaning the author must originate it. As a result, the law does not protect expressions that are generic and, therefore, cannot be attributed to any particular author, such as ideas, scènes à faire, or conventional programming standards.
For centuries, courts have struggled to consistently differentiate between original expressions and generic ones, resulting in systematic over-protection of copyrighted works. GenAI presents an unprecedented opportunity to inform and improve this legal analysis. By learning from data at various levels of granularity, GenAI systems are revealing the shared patterns in preexisting works that were previously difficult to measure accurately.
In this article, we propose a novel approach for measuring originality to assist in copyright legal disputes. We harness the powerful learning capacity of GenAI to gain more nuanced insights into the genericity of expressions on a significantly larger scale. Based on interdisciplinary research in computer science and law, we propose employing data-driven bi-as—a fundamental aspect of inductive machine learning—to assess the genericity of expressive compositions in preexisting works.
During learning, GenAI models distill and rank expressive compositions based on their prevalence in the models’ datasets. The more frequently these expressive compositions appear in the GenAI models’ datasets (indicating their “generic” nature), the more likely GenAI models are to utilize them when generating new works. Conversely, the rarer expressive compositions appear in the GenAI models’ datasets (indicating their “original” nature), the less likely GenAI models are to utilize them.
Leveraging the capacity of GenAI to learn with greater nuance and on a much grander scale could have groundbreaking implications for copyright law. It could assist courts in determining copyright scope, potentially leading to more efficient and equitable resolutions. Moreover, it has the potential to inform the Copyright Office’s registration practices and provide a valuable signal to facilitate market licensing transactions. Finally, by harnessing GenAI to measure originality at scale, our approach offers valuable insights to policymakers as they grapple with adapting copy-right law to meet the new challenges of an era of “cheap creativity” enabled by GenAI.
Keywords: Intellectual property, Artificial Intelligence, AI, Copyright, Originality
Suggested Citation: Suggested Citation