The Narrative AI Advantage? A Field Experiment on Generative AI-Augmented Evaluations of Early-Stage Innovations
60 Pages Posted: 3 Aug 2024
Date Written: August 02, 2024
Abstract
The rise of generative artificial intelligence (AI) is transforming creative problem-solving, necessitating new approaches for evaluating innovative solutions. This study explores how human-AI collaboration can enhance early-stage evaluations, focusing on the interplay between objective criteria, which are quantifiable, and subjective criteria, which rely on personal judgment. We conducted a field experiment with MIT Solve, involving 72 experts and 156 community screeners who evaluated 48 solutions for the 2024 Global Health Equity Challenge. Screeners received assistance from GPT-4, offering recommendations and, in some cases, rationale. We compared a human-only control group with two AI-assisted treatments: a black box AI and a narrative AI with probabilistic explanations justifying its decisions. Our findings show that AI-assisted screeners were 9 percentage points more likely to fail a solution. For objective criteria, there was no significant difference between the black box and narrative AI conditions. However, for subjective criteria, screeners adhered to narrative AI’s recommendations 12 percentage points more often than the black box AI’s. These effects were consistent across both experts and non-experts. Mouse tracking data showed that deeper engagement with AI’s objective failure recommendations led to more overrides of the AI, particularly in the narrative AI condition, reflecting increased scrutiny. Conversely, deeper engagement with AI’s subjective failure recommendations led to greater alignment with AI, particularly in the black box condition. This research underscores the importance of developing AI interaction expertise in creative evaluation processes that combine human judgment with AI insights. While AI can standardize decision-making for objective criteria, human oversight and critical thinking remain indispensable in subjective assessments, where AI should complement, not replace, human judgment.
Keywords: Creative evaluation, human-AI collaboration, large language models, screening, subjectivity, innovation, AI decision-support, field experiment, social impact
Suggested Citation: Suggested Citation