Guided Topic Modeling with Word2Vec: A Technical Note
29 Pages Posted: 5 Oct 2023
Date Written: September 19, 2023
Abstract
We propose GTM (Guided Topic Modeling), an algorithm that enables the fast and flexible generation of comprehensive topic clusters from (a pair of) seed words. The unsupervised algorithm performs clustering in the word-embedding space while offering the possibility to adjust the characteristics of the topic clusters via several hyperparameters. Applications for this methodology are information retrieval, classification and the calculation of various topic indices from news feeds.
Keywords: Topic Models, Text Analytics, Machine Learning, Natural Language Processing, Word2Vec, Alternative Data
JEL Classification: C55, C80, G10, C45
Suggested Citation: Suggested Citation