A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?

89 Pages Posted: 10 Mar 2023 Last revised: 23 May 2024

See all articles by Yang Chen

Yang Chen

University of Western Ontario - Richard Ivey School of Business

Samuel Kirshner

University of New South Wales (UNSW)

Anton Ovchinnikov

Smith School of Business - Queen's University; INSEAD - Decision Sciences

Meena Andiappan

University of Toronto

Tracy Jenkin

Queen's University - Smith School of Business

Date Written: May 20, 2024

Abstract

Problem definition: Large language models (LLMs) are being increasingly leveraged in business and consumer decision-making processes. Since LLMs learn from human data and feedback, which can be biased, determining whether LLMs exhibit human-like behavioral decision biases (e.g., base-rate neglect, risk aversion, confirmation bias) is crucial prior to implementing LLMs into decision-making contexts and workflows. To understand this, we examine 18 common human biases that are important in operations management (OM) using the dominant LLM, ChatGPT.

Methodology/results: We perform experiments where GPT-3.5 and GPT-4 act as participants to test these biases using vignettes adapted from the literature ("Standard context"') and variants reframed in inventory and general OM contexts. In almost half of the experiments using the Standard context, GPT mirrors human biases, diverging from prototypical human responses in the remaining experiments. We also observe that GPT models have a notable level of consistency between the Standard and OM-specific experiments. Our comparative analysis between GPT-3.5 and GPT-4 also reveals a dual-edged progression of GPT's decision-making, wherein it advances in decision-making accuracy for problems with well-defined mathematical solutions while simultaneously displaying increased behavioral biases for preference-based problems.

Managerial implications: First, our results highlight that managers will obtain the greatest benefits from deploying GPT to workflows leveraging established formulas. Second, GPT displayed a high level of response consistency across the Standard, Inventory, and non-inventory Operational contexts in our experiments, providing optimism that LLMs can provide reliable support even when details of the decision and problem contexts change. Third, although selecting between models like GPT-3.5 and GPT-4 represents a trade-off in cost and performance, our results suggest that managers should invest in the higher-performing model, particularly for solving problems with objective solutions.

Keywords: chatGPT, behavior, bias, decision-making, experiment, framing, overconfidence, ambiguity, prospect theory

Suggested Citation

Chen, Yang and Kirshner, Samuel and Ovchinnikov, Anton and Andiappan, Meena and Jenkin, Tracy, A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do? (May 20, 2024). Available at SSRN: https://ssrn.com/abstract=4380365 or http://dx.doi.org/10.2139/ssrn.4380365

Yang Chen

University of Western Ontario - Richard Ivey School of Business ( email )

1151 Richmond Street North
London, Ontario N6A 3K7
Canada

Samuel Kirshner

University of New South Wales (UNSW) ( email )

Kensington
High St
Sydney, NSW 2052
Australia

Anton Ovchinnikov (Contact Author)

Smith School of Business - Queen's University ( email )

143 Union Str. West
Kingston, ON K7L3N6
Canada

INSEAD - Decision Sciences ( email )

United States

Meena Andiappan

University of Toronto ( email )

105 St George Street
Toronto, M5S 3G8
Canada

Tracy Jenkin

Queen's University - Smith School of Business ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
2,414
Abstract Views
7,832
Rank
11,609
PlumX Metrics