How Well Can AI do Strategy? Empirical Benchmarking Using Strategy Simulations

Allen, Ryan; McDonald, Rory

doi:10.2139/ssrn.5239555

Download This Paper

Open PDF in Browser

Add Paper to My Library

How Well Can AI do Strategy? Empirical Benchmarking Using Strategy Simulations

32 Pages Posted: 7 May 2025

See all articles by Ryan Allen

Ryan Allen

University of Washington - Department of Management & Organization

Rory McDonald

University of Virginia - Darden School of Business

Date Written: May 01, 2025

Abstract

AI research has introduced several benchmarks tracking how large language models (LLMs) have rapidly advanced in lower-level tasks such as math, science, reading comprehension, and coding. Yet no systematic evaluation criteria currently exist to assess LLMs' unaided performance in strategic decision-making. The absence of a reliable benchmark limits strategy scholars' ability to answer fundamental questions about AI's capacity to augment or automate core strategic management decisions. We propose that AI's performance on established strategy teaching simulations offers a promising benchmark, as these exercises replicate the complexity and uncertainty of strategic decision-making in a controlled, validated, and replicable environment. In this paper, we benchmark the performance of OpenAI's models on the Back Bay Battery simulation, a widely used exercise in courses on strategy and innovation. Designed to test decision-making under uncertainty, the simulation requires participants to balance trade-offs between short-term profitability and long-term competitive positioning, while integrating diverse information about customer preferences, competitive moves, and evolving technologies over extended time horizons. We created an interface that allows AI to interact with the simulation without any fine-tuning or prompting beyond the information available within the simulation itself. We find that OpenAI's latest o3-mini model performs on par with MBA students from a top school. Other recent models (GPT-4o, o1-mini), while not as strong as o3-mini, significantly outperform earlier versions (GPT-4, GPT-3.5), although the pace of progress appears to have slowed. Beyond showing that AI can make effective strategic decisions, our simulation-based approach offers a useful empirical benchmark for tracking its future development.

Keywords: Strategy, Artificial Intelligence, Large Language Models, Strategic Decision-Making

JEL Classification: L26, O33, D83, C63

Suggested Citation: Suggested Citation

Allen, Ryan and McDonald, Rory, How Well Can AI do Strategy? Empirical Benchmarking Using Strategy Simulations (May 01, 2025). Available at SSRN: https://ssrn.com/abstract=5239555 or http://dx.doi.org/10.2139/ssrn.5239555