A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?
89 Pages Posted: 10 Mar 2023 Last revised: 23 May 2024
Date Written: May 20, 2024
Abstract
Problem definition: Large language models (LLMs) are being increasingly leveraged in business and consumer decision-making processes. Since LLMs learn from human data and feedback, which can be biased, determining whether LLMs exhibit human-like behavioral decision biases (e.g., base-rate neglect, risk aversion, confirmation bias) is crucial prior to implementing LLMs into decision-making contexts and workflows. To understand this, we examine 18 common human biases that are important in operations management (OM) using the dominant LLM, ChatGPT.
Methodology/results: We perform experiments where GPT-3.5 and GPT-4 act as participants to test these biases using vignettes adapted from the literature ("Standard context"') and variants reframed in inventory and general OM contexts. In almost half of the experiments using the Standard context, GPT mirrors human biases, diverging from prototypical human responses in the remaining experiments. We also observe that GPT models have a notable level of consistency between the Standard and OM-specific experiments. Our comparative analysis between GPT-3.5 and GPT-4 also reveals a dual-edged progression of GPT's decision-making, wherein it advances in decision-making accuracy for problems with well-defined mathematical solutions while simultaneously displaying increased behavioral biases for preference-based problems.
Managerial implications: First, our results highlight that managers will obtain the greatest benefits from deploying GPT to workflows leveraging established formulas. Second, GPT displayed a high level of response consistency across the Standard, Inventory, and non-inventory Operational contexts in our experiments, providing optimism that LLMs can provide reliable support even when details of the decision and problem contexts change. Third, although selecting between models like GPT-3.5 and GPT-4 represents a trade-off in cost and performance, our results suggest that managers should invest in the higher-performing model, particularly for solving problems with objective solutions.
Keywords: chatGPT, behavior, bias, decision-making, experiment, framing, overconfidence, ambiguity, prospect theory
Suggested Citation: Suggested Citation