Large Language Models for Market Research: A Data-augmentation Approach

47 Pages Posted: 6 Jan 2025 Last revised: 6 Jan 2025

See all articles by Mengxin Wang

Mengxin Wang

University of Texas at Dallas - Naveen Jindal School of Management

Dennis Zhang

Washington University in St. Louis - John M. Olin Business School

Heng Zhang

Supply Chain Management Department - W.P.Carey School of Business

Date Written: December 15, 2024

Abstract

Large Language Models (LLMs) have transformed artificial intelligence by excelling in complex natural language processing tasks. Their ability to generate human-like text has opened new possibilities for market research, particularly in conjoint analysis, where understanding consumer preferences is essential but often resource-intensive. Traditional survey-based methods face limitations in scalability and cost, making LLMgenerated data a promising alternative. However, while LLMs have the potential to simulate real consumer behavior, recent studies highlight a significant gap between LLM-generated and human data, with biases introduced when substituting between the two. In this paper, we address this gap by proposing a novel statistical data augmentation approach that efficiently integrates LLM-generated data with real data in conjoint analysis. Our method leverages transfer learning principles to debias the LLM-generated data using a small amount of human data. This results in statistically robust estimators with consistent and asymptotically normal properties, in contrast to naive approaches that simply substitute human data with LLM-generated data, which can exacerbate bias. We validate our framework through an empirical study on COVID-19 vaccine preferences, demonstrating its superior ability to reduce estimation error and save data and costs by 32.7% to 83.8%. In contrast, naive approaches fail to save data due to the inherent biases in LLM-generated data compared to human data. Another empirical study on sports car choices validates the robustness of our results. Our findings suggest that while LLM-generated data is not a direct substitute for human responses, it can serve as a valuable complement when used within a robust statistical framework.

Keywords: Conjoint Analysis, Data Augmentation, Large Language Models

Suggested Citation

Wang, Mengxin and Zhang, Dennis and Zhang, Heng, Large Language Models for Market Research: A Data-augmentation Approach (December 15, 2024). Available at SSRN: https://ssrn.com/abstract=5057769 or http://dx.doi.org/10.2139/ssrn.5057769

Mengxin Wang

University of Texas at Dallas - Naveen Jindal School of Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States

Dennis Zhang

Washington University in St. Louis - John M. Olin Business School ( email )

One Brookings Drive
Campus Box 1133
St. Louis, MO 63130-4899
United States

Heng Zhang (Contact Author)

Supply Chain Management Department - W.P.Carey School of Business ( email )

Tempe, AZ
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
72
Abstract Views
254
Rank
666,223
PlumX Metrics