Data Rivers: Carving Out the Public Domain in the Age of Generative AI
23 Pages Posted: 27 Apr 2023 Last revised: 15 May 2023
Date Written: March 14, 2023
The salient question, today, is not whether ‘copyright law [will] allow robots to learn’. The pressing question is whether the fragile data ecosystem that makes generative AI possible can be re-balanced through intervention that is timely enough. The threats to this ecosystem come from multiple fronts. They are comparable in kind to the threats currently affecting ‘water rivers’ across the globe.
First, just as the fundamental human right to water is only possible if ‘reasonable use’ and reciprocity constraints are imposed on the economic exploitation of rivers, so is the fundamental right to access culture, learn and build upon it. It is that right -and the moral aspirations underlying it- that has led millions to share their creative works under ‘open’ licenses. Generative AI tools would not have been possible without access to that rich, high-quality content. Yet few of those tools respect the reciprocity expectations without which the Creative Commons and Open-Source movements cease to be sustainable. The absence of internationally coordinated standards to systematically identify AI-generated content also threatens our ‘data rivers’ with irreversible pollution.
Second, the process that has allowed large corporations to seize control of data and its definition as an asset subject to property rights has effectively enabled the construction of hard structures -canals or dams- that has led to the rights of many of those lying up-or downstream of such structures to be ignored. While data protection laws seek to address those power imbalances by granting ‘personal’ data rights, the exercise of those rights remains demanding, just as it is challenging for artists to defend their IP rights in the face of AI-generated works that threaten them with redundancy.
To tackle the above threats, the long overdue reform of copyright can only be part of the required intervention. Equally important is the construction of bottom-up empowerment infrastructure that gives long term agency to those wishing to share their data and/or creative works. This infrastructure would also play a central role in reviving much-needed democratic engagement. Data not only carries traces of our past. It is also a powerful tool to envisage different futures. There is no doubt that tools such as GPT4 will change us. We would be fools to believe we may leverage those tools at the service of a variety of futures by merely imposing sets of ‘post-hoc’ regulatory constraints.
Keywords: Large language models, GPT 4, generative AI, public domain, copyright, empowerment, data trusts, privilege
Suggested Citation: Suggested Citation