Copyright Data Improvement for AI Licensing – The Role of Content Moderation and Text and Data Mining Rules
19 Pages Posted: 7 May 2024
Date Written: May 4, 2024
Abstract
To enable European authors, performers and creative industries to benefit from licensing opportunities in the field of new technologies, such as AI training, it is important to establish a comprehensive metadata infrastructure that ensures the visibility and accessibility of European work repertoires in digital and algorithmic environments, and facilitates rights clearance. Recognising the need for metadata improvement, various European initiatives aim to increase awareness among artists and rightholders, and build bridges between existing metadata collections and infrastructures. One central factor in the equation, however, has remained underexplored to this day: copyright norms may serve as legal vehicles to encourage rightholders to constantly provide updated metadata in standardised form. By strategically using copyright provisions as statutory incentive schemes for data improvement, metadata creation and updating could become a task which rightholders perform routinely. Copyright rules that already require the transmission of work-related information could be transformed into data improvement instruments that contribute to the evolution of accurate, harmonised and interoperable metadata. For instance, it seems possible to transform the notification of work-related information under Article 17 of the 2019 Directive on Copyright in the Digital Single Market (CDSMD) and the opt-out mechanism relating to text and data mining (TDM) under Article 4 of the same Directive into regulatory frameworks that generate a broader spectrum of descriptive and ownership data. If information stemming from these metadata engines is pooled in a central European copyright data repository, the accumulation of copyright data could lead to a metadata reservoir that is capable of enhancing licensing and remuneration opportunities in digital and algorithmic contexts.
The analysis of this metadata mainstreaming strategy first describes problems arising from inadequate copyright metadata and past initiatives that sought to improve the copyright data infrastructure in the EU. It then outlines new licensing opportunities that arise in the area of AI training. Exploring existing data-related rules in the EU copyright acquis against this background, it will become apparent that work notifications for content blocking purposes under Article 17 CDSMD and the reservation of copyright with regard to text and data mining under Article 4 CDSMD have a remarkable potential to foster copyright data improvement projects. Hence, it is worthwhile to explore options for transforming these copyright rules into legal tools to encourage metadata creation and improvement.
Keywords: copyright metatdata, generative AI, training data, harmonization, interoperability, standardization, transparency, TDM, content moderation, prohibition of formalities, EUIPO, cultural diversity, licensing, rights clearance, transaction costs, collective rights management organizations
Suggested Citation: Suggested Citation