Is AI Ground Truth Really ‘True’? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Accepted at MISQ. Citation: Lebovitz, S., Levina, N., Lifshitz-Assaf, H. (2021). "Is AI ground truth really "true"? The dangers of training and evaluation AI tools based on experts' know-what." Management Information Systems Quarterly Forthcoming Special Issue on "Managing AI".

55 Pages Posted: 5 May 2021 Last revised: 3 Jun 2021

See all articles by Sarah Lebovitz

Sarah Lebovitz

University of Virginia - McIntire School of Commerce

Natalia Levina

New York University

Hila Lifshitz-Assaf

New York University (NYU) - Leonard N. Stern School of Business; Harvard University - Business School (HBS)

Date Written: May 4, 2021

Abstract

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major US hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

Keywords: artificial intelligence, evaluation, uncertainty, new technology, professional knowledge work, innovation, know-how, medical diagnosis, ground truth

Suggested Citation

Lebovitz, Sarah and Levina, Natalia and Lifshitz-Assaf, Hila, Is AI Ground Truth Really ‘True’? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What (May 4, 2021). Accepted at MISQ. Citation: Lebovitz, S., Levina, N., Lifshitz-Assaf, H. (2021). "Is AI ground truth really "true"? The dangers of training and evaluation AI tools based on experts' know-what." Management Information Systems Quarterly Forthcoming Special Issue on "Managing AI"., Available at SSRN: https://ssrn.com/abstract=3839601

Sarah Lebovitz (Contact Author)

University of Virginia - McIntire School of Commerce ( email )

P.O. Box 400173
Charlottesville, VA 22904-4173
United States

Natalia Levina

New York University ( email )

44 West Fourth Street
New York, NY 10012
United States

HOME PAGE: http://pages.stern.nyu.edu/~nlevina

Hila Lifshitz-Assaf

New York University (NYU) - Leonard N. Stern School of Business ( email )

44 West 4th Street
Suite 9-160
New York, NY NY 10012
United States

Harvard University - Business School (HBS) ( email )

Soldiers Field Road
Cotting House 321A
Boston, MA 02163
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
70
Abstract Views
241
rank
393,993
PlumX Metrics