Copyright Data Improvement for AI Licensing – The Role of Content Moderation and Text and Data Mining Rules

19 Pages Posted: 7 May 2024

See all articles by Martin Senftleben

Martin Senftleben

Institute for Information Law (IViR), University of Amsterdam; University of Amsterdam

Date Written: May 4, 2024

Abstract

To enable European authors, performers and creative industries to benefit from licensing opportunities in the field of new technologies, such as AI training, it is important to establish a comprehensive metadata infrastructure that ensures the visibility and accessibility of European work repertoires in digital and algorithmic environments, and facilitates rights clearance. Recognising the need for metadata improvement, various European initiatives aim to increase awareness among artists and rightholders, and build bridges between existing metadata collections and infrastructures. One central factor in the equation, however, has remained underexplored to this day: copyright norms may serve as legal vehicles to encourage rightholders to constantly provide updated metadata in standardised form. By strategically using copyright provisions as statutory incentive schemes for data improvement, metadata creation and updating could become a task which rightholders perform routinely. Copyright rules that already require the transmission of work-related information could be transformed into data improvement instruments that contribute to the evolution of accurate, harmonised and interoperable metadata. For instance, it seems possible to transform the notification of work-related information under Article 17 of the 2019 Directive on Copyright in the Digital Single Market (CDSMD) and the opt-out mechanism relating to text and data mining (TDM) under Article 4 of the same Directive into regulatory frameworks that generate a broader spectrum of descriptive and ownership data. If information stemming from these metadata engines is pooled in a central European copyright data repository, the accumulation of copyright data could lead to a metadata reservoir that is capable of enhancing licensing and remuneration opportunities in digital and algorithmic contexts.

The analysis of this metadata mainstreaming strategy first describes problems arising from inadequate copyright metadata and past initiatives that sought to improve the copyright data infrastructure in the EU. It then outlines new licensing opportunities that arise in the area of AI training. Exploring existing data-related rules in the EU copyright acquis against this background, it will become apparent that work notifications for content blocking purposes under Article 17 CDSMD and the reservation of copyright with regard to text and data mining under Article 4 CDSMD have a remarkable potential to foster copyright data improvement projects. Hence, it is worthwhile to explore options for transforming these copyright rules into legal tools to encourage metadata creation and improvement.

Keywords: copyright metatdata, generative AI, training data, harmonization, interoperability, standardization, transparency, TDM, content moderation, prohibition of formalities, EUIPO, cultural diversity, licensing, rights clearance, transaction costs, collective rights management organizations

Suggested Citation

Senftleben, Martin, Copyright Data Improvement for AI Licensing – The Role of Content Moderation and Text and Data Mining Rules (May 4, 2024). Available at SSRN: https://ssrn.com/abstract=4817796 or http://dx.doi.org/10.2139/ssrn.4817796

Martin Senftleben (Contact Author)

Institute for Information Law (IViR), University of Amsterdam ( email )

Rokin 84
Amsterdam, 1012 KX
Netherlands

University of Amsterdam ( email )

Roetersstraat 11
Amsterdam, NE 1018 WB
Netherlands

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
553
Abstract Views
1,908
Rank
108,507
PlumX Metrics