Centralization, Fragmentation, and Replication in the Genomic Data Commons
Centralization, Fragmentation, and Replication in the Genomic Data Commons, in Governing Medical Knowledge Commons (Brett M. Frischmann, Michael J. Madison, and Katherine J. Strandburg eds., Cambridge University Press 2017)
30 Pages Posted: 26 Aug 2015 Last revised: 4 Jan 2018
Date Written: August 24, 2015
Researchers around the world deposit enormous amounts of genomic sequence data and related information into public databases, thus creating a genomic data commons. This chapter examines specific governance challenges of correcting, updating, and annotating these data. Delving into the science of genome sequencing, assembly, and annotation, it highlights the indeterminate nature of sequence data and related information and the high rate of errors in public databases such as GenBank. Drawing on the Institutional Analysis and Development framework, it then examines four approaches for dynamically correcting and modifying these data: author-centric data management, third-party biocuration, community-based wikification, and specialized databases and genome browsers. Notably, these approaches reveal deep tensions between centralization and fragmentation in the structure of the genomic data commons. On the one hand, author-centric data management and third-party biocuration represent highly centralized mechanisms for controlling data. On the other hand, wiki-based annotation disperses control throughout the community, exploiting the power of the commons and parallel data analysis to update existing data records. Attempting to capture the best of both worlds, specialized databases and genome browsers exploit replication and the nonrivalrous nature of information to preserve original data records while allowing users to codify vast amounts of value-added knowledge. This study shows that far from a being a passive repository of information, the genomic data commons is a teeming, dynamic entity in which communal intervention is critical to enhancing collective knowledge. Ultimately, the genomic data commons is an intensely human commons in more ways than one.
Keywords: commons, genomics, GenBank, data, databases, biocuration, wikification, gene browsers, NIH
Suggested Citation: Suggested Citation