The paradox of lawful text and data mining? Some experiences from the research sector and where we (should) go from here
20 Pages Posted: 4 Nov 2024 Last revised: 26 Oct 2024
Date Written: October 23, 2024
Abstract
Scientific research can be tricky business. This paper critically explores the 'lawful access' requirement in European copyright law which applies to text and data mining (TDM) carried out for the purpose of scientific research. Whereas TDM is essential for data analysis, artificial intelligence (AI) and innovation, the paper argues that the 'lawful access' requirement in Article 3 CDSM Directive may actually restrict research by complicating the applicability of the TDM provision or even rendering it inoperable. Although the requirement is intended to ensure that researchers act in good faith before deploying TMD tools for purposes such as machine learning, it forces them to ask for permission to access data, for example by taking out a subscription to a service, and for that reason provides the opportunity for copyright holders to apply all sorts of commercial strategies to set the legal and technological parameters of access and potentially even circumvent the mandatory character of the provision. The paper concludes by drawing on insights from the recent European Commission study 'Improving access to and reuse of research results, publications and data for scientific purposes' that offer essential perspectives for the future of TDM, and by suggesting a number of paths forward that EU Member States can take already now in order to support a more predictable and reliable legal regime for scientific TDM and potentially code mining to foster innovation.
Keywords: Copyright, text and data mining, AI, machine learning, CDSM Directive, licensing, copyright exceptions
Suggested Citation: Suggested Citation