The Language of Images: Performance of Image Classification Paradigms in Marketing
41 Pages Posted: 28 Sep 2022 Last revised: 6 Dec 2024
Date Written: December 05, 2024
Abstract
Images say more than a thousand words. But can marketing leverage language modeling concepts to study image content? Marketing utilizes image classification for studying how advertising, social media, or e-commerce images relate to consumer perceptions and economic outcomes. To accomplish this, marketing publications apply variants of convolutional neural networks that analyze local image patterns by examining neighboring pixels. Recent transformer architectures, inspired by language modeling, study images more holistically by learning relationships across distant image parts. Even newer vision language models based on generative AI advances create detailed text interpretations of what they ’see’ in images. We study the benefits of these advances based on 18 marketing-related datasets that cover what and who is visible, as well as how images are perceived. On average, language modeling concepts improve image classification accuracy by more than 10 percentage points. When training data is abundant, transformer architectures perform best and also most consistently across datasets. Performance of vision language models varies. Relative to alternatives, these models perform strongest with limited training data and for complex tasks focused on how images are perceived. Combining them with transformer-inspired architectures as a multi-paradigm ensemble achieves the best of both worlds, with the highest and most consistent performance across all tasks and datasets we study.
Keywords: generative AI, computer vision, image mining, machine learning, image classification, marketing insight
Suggested Citation: Suggested Citation