The paradox of lawful text and data mining? Some experiences from the research sector and where we (should) go from here

20 Pages Posted: 4 Nov 2024 Last revised: 26 Oct 2024

See all articles by Kacper Szkalej

Kacper Szkalej

University of Amsterdam - Institute for Information Law (IViR); University of Amsterdam

Date Written: October 23, 2024

Abstract

Scientific research can be tricky business. This paper critically explores the 'lawful access' requirement in European copyright law which applies to text and data mining (TDM) carried out for the purpose of scientific research. Whereas TDM is essential for data analysis, artificial intelligence (AI) and innovation, the paper argues that the 'lawful access' requirement in Article 3 CDSM Directive may actually restrict research by complicating the applicability of the TDM provision or even rendering it inoperable. Although the requirement is intended to ensure that researchers act in good faith before deploying TMD tools for purposes such as machine learning, it forces them to ask for permission to access data, for example by taking out a subscription to a service, and for that reason provides the opportunity for copyright holders to apply all sorts of commercial strategies to set the legal and technological parameters of access and potentially even circumvent the mandatory character of the provision. The paper concludes by drawing on insights from the recent European Commission study 'Improving access to and reuse of research results, publications and data for scientific purposes' that offer essential perspectives for the future of TDM, and by suggesting a number of paths forward that EU Member States can take already now in order to support a more predictable and reliable legal regime for scientific TDM and potentially code mining to foster innovation.

Keywords: Copyright, text and data mining, AI, machine learning, CDSM Directive, licensing, copyright exceptions

Suggested Citation

Szkalej, Kacper, The paradox of lawful text and data mining? Some experiences from the research sector and where we (should) go from here (October 23, 2024). Available at SSRN: https://ssrn.com/abstract=5000116 or http://dx.doi.org/10.2139/ssrn.5000116

Kacper Szkalej (Contact Author)

University of Amsterdam - Institute for Information Law (IViR) ( email )

Rokin 84
Amsterdam, 1012 KX
Netherlands

University of Amsterdam ( email )

Spui 21
Amsterdam, 1018 WB
Netherlands

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
145
Abstract Views
374
Rank
398,273
PlumX Metrics