Creating Synthetic Experts with Generative Artificial Intelligence
16 Pages Posted: 20 Aug 2023
Date Written: December 5, 2023
Classification is paramount in today’s data-rich environment as marketers increasingly depend on machine learning to distill intelligence from vast amounts of unstructured text such as news, reports, and social media. Modern classification models swiftly identify constructs of interest, such as sentiment or product categorizations to inform research and managerial decision-making. Training an effective classification model requires many correctly labeled examples. While simple constructs can be labeled via crowdsourcing, more abstract and multifaceted constructs necessitate expert labelers—a scarce resource. We study whether generative AI, specifically ChatGPT4, can replace domain experts for identifying a central marketing construct in microblogs: brands’ marketing mix. We find that, unlike crowdsourced labels, those generated by ChatGPT4 are in high agreement with expert labels. We overcome ChatGPT4's proprietary nature, slow speed, high cost, and limited reproducibility by approximating it with an open-source model that is fine-tuned on ChatGPT4's labels. The created Synthetic Expert exhibits near-parity with ChatGPT4 in terms of expert agreement, is highly scalable, free from third-party constraints, and produces perfectly reproducible results. When paired with sentiment analysis, it reveals different distributions of consumer sentiment across the marketing mix of 699 brands, with substantially varying strengths and weakness among competing brands. Deeper analysis uncovers marketing mix specific topics that consumers raise online. By introducing Synthetic Twins, AI-generated replicas of training texts that correspond in idea and meaning to their original counterparts, this research mitigates privacy, confidentiality, and intellectual property concerns for model training and data sharing.
Keywords: Classification, Generative Artificial Intelligence, Large Language Models, Marketing Mix Variables
Suggested Citation: Suggested Citation