Is AI Ground Truth Really ‘True’? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What

Citation: Lebovitz, S., Levina, N., Lifshitz-Assaf, H. (2021). "Is AI ground truth really true? The dangers of training and evaluation AI tools based on experts' know-what." Management Information Systems Quarterly, 45(3b), pp: 1501-1525 available online: https://doi.10.25300/MISQ/2021/16564

Posted: 5 May 2021 Last revised: 21 Mar 2023

See all articles by Sarah Lebovitz

Sarah Lebovitz

University of Virginia - McIntire School of Commerce

Natalia Levina

New York University (NYU) - Department of Information, Operations, and Management Sciences

Hila Lifshitz-Assaf

Harvard University Lab for Innovation Sciences; Harvard LISH, Lab for Innovation Sciences; University of Warwick, Warwick Business School

Date Written: May 4, 2021

Abstract

Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools outperform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major US hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.

Keywords: artificial intelligence, evaluation, uncertainty, new technology, professional knowledge work, innovation, know-how, medical diagnosis, ground truth

Suggested Citation

Lebovitz, Sarah and Levina, Natalia and Lifshitz-Assaf, Hila, Is AI Ground Truth Really ‘True’? The Dangers of Training and Evaluating AI Tools Based on Experts’ Know-What (May 4, 2021). Citation: Lebovitz, S., Levina, N., Lifshitz-Assaf, H. (2021). "Is AI ground truth really true? The dangers of training and evaluation AI tools based on experts' know-what." Management Information Systems Quarterly, 45(3b), pp: 1501-1525 available online: https://doi.10.25300/MISQ/2021/16564 , Available at SSRN: https://ssrn.com/abstract=3839601

Sarah Lebovitz (Contact Author)

University of Virginia - McIntire School of Commerce ( email )

P.O. Box 400173
Charlottesville, VA 22904-4173
United States

Natalia Levina

New York University (NYU) - Department of Information, Operations, and Management Sciences ( email )

44 West Fourth Street
New York, NY 10012
United States

HOME PAGE: http://https://wp.nyu.edu/natalialevina/

Hila Lifshitz-Assaf

Harvard University Lab for Innovation Sciences ( email )

Soldiers Field Road
Cotting House 321A
Boston, MA 02163
United States

Harvard LISH, Lab for Innovation Sciences ( email )

William James Hall, Sixth Floor
33 Kirkland Street
Cambridge, MA 02138

University of Warwick, Warwick Business School ( email )

West Midlands, CV4 7AL
United Kingdom

HOME PAGE: http://https://www.hilalifshitz.com/

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
7,306
PlumX Metrics