Guided Diverse Concept Miner (GDCM): Uncovering Relevant Constructs for Managerial Insights From Text
Information Systems Research
57 Pages Posted: 21 Dec 2018 Last revised: 3 Apr 2024
Date Written: May 20, 2018
Abstract
Guided Diverse Concept Miner (GDCM) is an interpretable deep learning algorithm to (1) automatically extract corpus-level concepts from text data, (2) focus the discovery of concepts to filter through only the concepts highly correlated to the user-specified managerial outcome, and (3) quantify the concept’s correlational importance to the outcome. GDCM is used to explore and potentially extract previously unknown concepts and insights from the text that may explain the managerial outcome, without the need to provide any human-predefined guidance or labeled data on concepts. GDCM embeds words, documents, and concepts all in the same vector space, enabling easy interpretation of discovered concepts by associating words local to the concept vector. GDCM is explicitly configured to increase recovered-concept diversity, coherence, and relevance to managerial outcomes.
We demonstrate GDCM as a “guided exploratory” tool for a hypothetical managerial case involving online purchase journey data connected to consumed reviews. GDCM scalably extracts concepts hidden in customer reviews highly correlated to conversion and provides concept importance in comparison to product ratings. Concepts produced turn out to be product qualities previously theorized to impact conversion in literature and correlational importance gauged by GDCM closely matches estimates from a previous causal study run on a similar dataset, serving as external validations of GDCM as a “guided exploratory” tool. Additional experiments with other data show that extracted insights are sensitive to guiding managerial variables, and sensibly so, further demonstrating the flexibility of GDCM as a managerial tool.
Keywords: Text, Managerial Insight Extraction, Guided Exploration, Concept Extraction, Deep Learning, Interpretable Machine Learning
JEL Classification: C38, C39, M31, M39
Suggested Citation: Suggested Citation