Analysis of UK/EU Law on Data Mining in Higher Education Institutions

40 Pages Posted: 22 Apr 2013

See all articles by Andrés Guadamuz

Andrés Guadamuz

University of Sussex

Diane Cabell

University of Oxford - Oxford e-Research Centre

Date Written: January 15, 2013


Data or text mining (hereafter called “content mining”) is a process that uses software that looks for interesting or important patterns in data that might otherwise not be observed. An example might be combining a database of journal articles about ground water pollution with one of hospital admissions to detect a pollution-related pattern of disease breakout.

It is also a useful tool in commerce. A credit card company might detect a correlation between purchases of tickets from particular airline with purchases of certain types of automobiles and develop a marketing program uniting appropriate vendors. One McKinsey report states that the utilization of ‘big data’ in the sphere of public data alone could create €250 billion annual value to Europe’s economy.

Content mining is increasingly accomplished by machine. Databases, particularly those produced by scientific research, are far too large to be scanned by human eyeball. However, the right to mine data is not assured by the law in most jurisdictions and even where it is, the terms of access to the majority of research publication databases deny permission to do so. One recent study indicated that obtaining permission to mine the thousands of articles appearing on a single subject from the myriad of different publishers would require 62% of a researcher’s time. Many content owners, including research institutions, have yet to develop any policy on content mining.

This report will identify the main legal barriers to data mining and data reuse and make policy suggestions to guide governments, funding agencies, and research institutions. As the title suggests, the emphasis of the study is about legal issues that are specific to higher education institutions (HEIs).

The first challenge for this report is to attempt to delimit the subject matter, as various types of content that are subject to automated analysis. HEIs can hold and share content of various formats, here are just a few examples:

Text: published articles, book chapters, preparatory notes, working papers, reports, teaching materials, conference papers, presentations, theses.

Datasets: statistical data, geolocation data, survey results, maps, figures, time series, genetic information, health records, computer logs.

Multimedia: pictures, sound recordings, interviews, presentations, video.

Each of the above may have separate legal regimes applying to them. In the interest of convenience and simplicity, whenever the report talks about database contents, there will be no distinction as to whether we are dealing with text, data or multimedia, unless clearly specified in the text.

Keywords: cyberlaw, copyright, data mining, content mining, databases, database right, UK, EU, public sector information, creative commons, open access

JEL Classification: K00

Suggested Citation

Guadamuz, Andres and Cabell, Diane, Analysis of UK/EU Law on Data Mining in Higher Education Institutions (January 15, 2013). Available at SSRN: or

Andres Guadamuz (Contact Author)

University of Sussex ( email )

Brighton, BN1 9QN
United Kingdom

Diane Cabell

University of Oxford - Oxford e-Research Centre ( email )

7 Keble Road
Oxford, OX1 3QG
United Kingdom

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics