Data Standardization
30 Pages Posted: 14 Feb 2019 Last revised: 4 Jul 2019
Date Written: June 2019
Abstract
With data rapidly becoming the lifeblood of the global economy, the efficiency of its use significantly affects both social and private welfare. While the volume of data collected has reached astronomical proportions, the fact that most data are collected in a system that is largely modular and distributed creates a “Tower of Babel” of different databases, thereby potentially limiting synergetic knowledge production. Data standardization is key to facilitating and improving the use of data. Indeed, standardization is a precondition for the operation of industries in which cross-firm and cross-industry data exchanges are critical. It can also create substantial benefits when data synergies carry high value. It does so by reducing the three main technological obstacles to data portability and interoperability: metadata uncertainties, data transfer obstacles, and missing data.
Despite the importance of data standardization, the role of the government in such standardization is rarely examined. This article analyzes the justifications for and the limitations of data standardization in light of data’s special characteristics. We first review the relevant characteristics of data and data markets. We discuss the three main technological obstacles to widening the use of data, based on interviews with data scientists. We then explain how data standardization affects these obstacles and analyze the benefits and costs of data standardization. As shown, data raises new considerations about the appropriate interventionist role for regulators, that are generally absent from debates about the standardization of other products, and adds a novel dimension to some of the traditional considerations raised in more typical cases. For example, on the plus side, data standardization can lead to smoother data flows, better machine learning, and easier policing in cases where rights are infringed or unjustified harms are created by data-fed algorithms. It might also help support a more competitive and distributed data collection ecosystem. At the same time, increasing the scale and scope of data analysis can create negative externalities in the form of better profiling, increased harms to privacy, and cybersecurity harms. Repercussions for investment and innovation in data collection and analysis are also analyzed.
Finally, we explore whether market-led standardization initiatives can be relied upon to increase welfare, and evaluate the role governmental-facilitated standardization should play, if at all. We show that the need for reviewing and possibly facilitating data standards can be especially strong where potential data synergies are cross-industry or inter-temporal. We also explore the appropriateness of different regulatory methods for achieving these tasks. While governmental facilitation of data standardization may be justified only in limited scenarios, the current situation in which data standardization is rarely considered carries a large price tag.
Suggested Citation: Suggested Citation