The Data-Pooling Problem

52 Pages Posted: 11 Oct 2015 Last revised: 19 May 2017

See all articles by Michael Mattioli

Michael Mattioli

Indiana University Maurer School of Law

Date Written: May 1, 2017


American innovation policy as expressed through intellectual property law contains a curious gap: it encourages individual research investments, but does little to facilitate cooperation among inventors, which is often a necessary precondition for innovation.

This Article provides an in-depth analysis of a policy problem that relates to this gap: increasingly, public and private innovation investments depend upon the willingness of private firms and institutions to cooperatively pool industrial, commercial, and scientific data. Data holders often have powerful disincentives to cooperate with one another, however. As a result, important research that the federal government has sought to encourage through intellectual property policy and through other targeted investments is being held back.

This Article addresses this issue by offering three contributions—one theoretical, one empirical, and one prescriptive. The theoretical contribution builds upon legal, economic, and public choice literature to explain why pooling data is relevant to innovation policy, and why the current level of data sharing may often be sub-optimal. This discussion offers a conceptual framework for scholars and policymakers to examine how data-pooling problems can harm innovation policy goals.

This leads to the second contribution: an ethnographic study of private efforts to pool data in an important field of research. Interviews with lawyers, executives, and scientists working at the vanguard of “Big Data” projects in the field of cancer research offer a detailed and sometimes surprising view of how data-pooling problems can hinder technological progress in an important field of research.

The study’s most significant finding is that impediments to the pooling of patient treatment and clinical trial data are diverse, contextual, and not neatly reducible to collective action problems already widely identified by legal scholars and economists, such as the free-rider dilemma, or the effect of privacy law on information sharing.

These findings lead to the third key contribution: a set of suggestions for targeted policy suggestions designed to facilitate data-pooling through regulations, amendments to federal healthcare legislation, and tax incentives. These prescriptive measures are tailored to address the sharing of health-related data, but they capture an approach that can be applied in other settings where technological progress depends upon data-pooling. Ultimately, this Article argues for a vision of innovation policy in which cooperative exchanges of data are regarded as important preconditions for innovation that may often require government support.

Keywords: big data, data pools, data pooling, data-intensive science, intellectual property, collective action, innovation policy, innovation theory, information transactions, information exchange, innovation, health data, cancer treatment, machine learning

JEL Classification: O34, D45, O31, O32, K32

Suggested Citation

Mattioli, Michael, The Data-Pooling Problem (May 1, 2017). Berkeley Technology Law Journal, Vol. 32, No. 1, 2017, Forthcoming, Available at SSRN: or

Michael Mattioli (Contact Author)

Indiana University Maurer School of Law ( email )

211 S. Indiana Avenue
Bloomington, IN 47405
United States


Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics