Synthetic Data: Legal Implications of the Data-Generation Revolution
58 Pages Posted: 20 Apr 2023
Date Written: April 10, 2023
A data-generation revolution is underway. To date, most data used for algorithmic decision-making is collected from events that take place in the physical world (“collected” data) however this is about to change. By 2024, it is predicted that 60% of data used to train artificial intelligence systems globally will be synthetic. Synthetic data is artificially-generated data, created using generative AI, that has analytical value. It can be used to replace collected data by preserving or mimicking its properties or to supplement collected data to improve its completeness or to enhance privacy protections. This article analyses the legal implications of synthetic data for markets and for society. It argues that synthetic data challenges the equilibrium found in existing laws which strike a balance between competing values, including data utility, privacy, security, and human rights. This article seeks to bring state-of-the-art data generation methods into the legal debate, and to propose legal reforms which capture the unique characteristics of synthetic data. While some of the challenges discussed here also arise with the use of collected data, synthetic data exacerbates these challenges putting further pressure on existing regulatory regimes and expediting the need for data governance reform.
Keywords: IT law, artificial intelligence law, regulation, big data, data privacy, personal data
Suggested Citation: Suggested Citation