Unsupervised Discovery of Non-Trivial Similarities between Online Communities

30 Pages Posted: 7 Mar 2022

See all articles by Abraham Israeli

Abraham Israeli

Ben-Gurion University of the Negev

Shani Cohen

Ben-Gurion University of the Negev

Oren Tsur

Ben-Gurion University of the Negev

Abstract

In this work we introduce C3 – a novel unsupervised approach for community comparison. C3 creates contextual pairwise representations by aligning communities and tuning word embeddings according to both the lexical context, and the social context reflected by the community’s structure and the community engagement patterns. C3 is evaluated over a dataset of 1565 active Reddit communities, comparing results against three competitive models. We show through an array of experiments and validations that C3 recovers nuanced and not-trivial similarities between communities that are not captured by other methods. We complement our quantitative results with a qualitative analysis, discussing recovered non-trivial similarities between community pairs such as (opiates:adhd) , (babyBumps:depression) , and (wallStreetBets:sandersForPresident) , all of which are not recovered by other models. This qualitative analysis demonstrates the exploratory power of our model.

Keywords: machine learning, Online Communities, Natural Language
Processing, Word Embeddings, Social Network Analysis, Computational Social Science

Suggested Citation

Israeli, Abraham and Cohen, Shani and Tsur, Oren, Unsupervised Discovery of Non-Trivial Similarities between Online Communities. Available at SSRN: https://ssrn.com/abstract=4051307 or http://dx.doi.org/10.2139/ssrn.4051307

Abraham Israeli (Contact Author)

Ben-Gurion University of the Negev ( email )

1 Ben-Gurion Blvd
Beer-Sheba 84105, 84105
Israel

Shani Cohen

Ben-Gurion University of the Negev ( email )

1 Ben-Gurion Blvd
Beer-Sheba 84105, 84105
Israel

Oren Tsur

Ben-Gurion University of the Negev ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
36
Abstract Views
161
PlumX Metrics