‘Equality and Privacy by Design’: Ensuring Artificial Intelligence (AI) Is Properly Trained & Fed: A New Model of AI Data Transparency & Certification As Safe Harbor Procedures

65 Pages Posted: 5 Dec 2018 Last revised: 27 Dec 2018

See all articles by Shlomit Yanisky-Ravid

Shlomit Yanisky-Ravid

Yale Law School; ONO Academic College; Yale University - Information Society Project; Fordham University, School of Law

Sean Hallisey

Fordham University School of Law

Date Written: November 5, 2018


Artificial Intelligence systems (“AI”) are often described as a technological breakthrough that will completely transform our society and economy. AI systems have been implemented in all facets of the economy, from medicine to transportation, finance, art, legal, social, and weapons; making decisions previously determined by humans. While this article recognizes that AI systems promise benefits, it also identifies urgent challenges to our everyday life. Just as the technology has become prolific, so has the literature concerning its legal implications. However, the literature suffers from a lack of solutions that address the legal and engineering perspectives. This leaves technology firms without guidelines and increases the risk of societal harm. Policymakers, including judges, operate without a regulatory regime to turn to when addressing these novel and unpredictable outcomes. This article tries to fill the void by focusing on the use of data by these systems, rather than on the software and software programmers. It suggests a new Model that stems from a recognition of the significant role that the data plays in the development and functioning of AI systems.

One of the most important phases of teaching AI systems to operate starts with a preexisting massive dataset that the data providers use to train the system. The data providers are programmers, trainers; the stakeholders who enable access to data or the systems’ users. In this article, we analyze and discuss the threats the use of data by AI systems pose in terms of producing discriminatory outcomes as well as violations of privacy.

The data can be illegal, discriminatory, manufacture, unreliable, or simply incomplete. The more data that AI systems “swallow,” the likelihood increases that AI systems could produce biased, discriminatory decisions and/or violate privacy. The article discusses how discrimination can arise, even inadvertently, from the operation of “trusted” and "objective" AI systems. The article addresses, on the one hand, the hurdles and challenges behind the use of big data by AI systems, and on the other, suggests a possible, new solution.

We propose a new AI data transparency Model that focuses on disclosure of the data being used by AI systems, when necessary. To perfect the Model we recommend an auditing regime and a certification program, either by governmental body or, in the absence of such entity, by private institutes. This Model will encourage the industry to take steps, proactively, to ensure that the dataset is trustworthy and then, to publicly exhibit the quality of the data (that their AI systems rely on). By receiving and publicizing a quality “stamp” the firms will fairly build their competitive reputation and will strengthen the public control of the systems.

We envision that the implementation of this Model will help firms and individuals become educated about the potential issues concerning AI, discrimination and the continued weakening of societal expectations of privacy. In this sense, the AI data transparency Model operates as a safe harbor mechanism that incentivizes the industry, from the first steps of developing and training AI systems, to the actual operation of the AI systems, to implement effective standards, that we coin Equality and Privacy by Design.

The suggested AI Transparency Model functions as a safe harbor, even without massive regulatory steps. From an engineering point of view, not only does the model recognize the data providers and the big data as the most important components in the process of creating, training and operating AI systems, but the AI Data Transparency Model is also technologically feasible as data can be easily absorbed and kept by a technological tool. This Model is feasible from a practical perspective, as it follows already existing legal frameworks of data transparency, such as the ones being implemented by the FDA and SEC.

We argue that improving transparency in data systems should result in less harmful AI systems, better protect societal rights and norms, and produce improved outcomes in this emerging field, specifically for minority communities, who often lack resources or representation to combat the use of AI systems. We assert that improvements in transparency regarding the data used while developing, training or operating AI systems could mitigate and reduce these harms. We recommend critical evaluations and audits of data used to train AI systems to identify such risks, and propose a certification system whereby AI systems can publicize good faith efforts to reduce the possibility of discriminatory outcomes and privacy violations. We do not purport to solve the riddle of every possible negative outcome created by AI systems; instead, we are trying to incentivize the creation of new standards that the industry could implement, from day one of developing AI systems that addresses the possibility of harm, rather than post-hoc assignments of liability.

Suggested Citation

Yanisky-Ravid, Shlomit and Hallisey, Sean, ‘Equality and Privacy by Design’: Ensuring Artificial Intelligence (AI) Is Properly Trained & Fed: A New Model of AI Data Transparency & Certification As Safe Harbor Procedures (November 5, 2018). Available at SSRN: https://ssrn.com/abstract=3278490 or http://dx.doi.org/10.2139/ssrn.3278490

Shlomit Yanisky-Ravid (Contact Author)

Yale Law School ( email )

127 Wall Street
New Haven, CT 06511
United States

ONO Academic College ( email )

Tzahal Street 104
Kiryat Ono, 55000

Yale University - Information Society Project ( email )

P.O. Box 208215
New Haven, CT 06520-8215
United States

Fordham University, School of Law ( email )

140 West 62nd Street
New York, NY 10023
United States

Sean Hallisey

Fordham University School of Law ( email )

140 West 62nd Street
New York, NY 10023
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics