Unsupervised Discovery of Non-Trivial Similarities between Online Communities
30 Pages Posted: 7 Mar 2022
Abstract
In this work we introduce C3 – a novel unsupervised approach for community comparison. C3 creates contextual pairwise representations by aligning communities and tuning word embeddings according to both the lexical context, and the social context reflected by the community’s structure and the community engagement patterns. C3 is evaluated over a dataset of 1565 active Reddit communities, comparing results against three competitive models. We show through an array of experiments and validations that C3 recovers nuanced and not-trivial similarities between communities that are not captured by other methods. We complement our quantitative results with a qualitative analysis, discussing recovered non-trivial similarities between community pairs such as (opiates:adhd) , (babyBumps:depression) , and (wallStreetBets:sandersForPresident) , all of which are not recovered by other models. This qualitative analysis demonstrates the exploratory power of our model.
Keywords: machine learning, Online Communities, Natural Language Processing, Word Embeddings, Social Network Analysis, Computational Social Science
Suggested Citation: Suggested Citation