Guided Topic Modeling with Word2Vec: A Technical Note

29 Pages Posted: 5 Oct 2023

See all articles by Thomas Dangl

Thomas Dangl

Vienna University of Technology

Stefan Salbrechter

Deka Investment GmbH; Vienna University of Technology

Date Written: September 19, 2023

Abstract

We propose GTM (Guided Topic Modeling), an algorithm that enables the fast and flexible generation of comprehensive topic clusters from (a pair of) seed words. The unsupervised algorithm performs clustering in the word-embedding space while offering the possibility to adjust the characteristics of the topic clusters via several hyperparameters. Applications for this methodology are information retrieval, classification and the calculation of various topic indices from news feeds.

Keywords: Topic Models, Text Analytics, Machine Learning, Natural Language Processing, Word2Vec, Alternative Data

JEL Classification: C55, C80, G10, C45

Suggested Citation

Dangl, Thomas and Salbrechter, Stefan, Guided Topic Modeling with Word2Vec: A Technical Note (September 19, 2023). Available at SSRN: https://ssrn.com/abstract=4575985 or http://dx.doi.org/10.2139/ssrn.4575985

Thomas Dangl

Vienna University of Technology ( email )

Theresianumgasse 27
Vienna, A-1040
Austria

Stefan Salbrechter (Contact Author)

Deka Investment GmbH ( email )

Mainzer Landstrasse 16
Frankfurt am Main, 60325
Germany

Vienna University of Technology ( email )

Karlsplatz 13
Vienna
Austria

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
35
Abstract Views
149
PlumX Metrics