Unsupervised Machine Learning for Clustering in Political and Social Research

New York, NY: Cambridge University Press (Forthcoming)

62 Pages Posted: 4 Nov 2020

See all articles by Philip Waggoner

Philip Waggoner

Columbia University, ISERP; YouGov America

Date Written: September 1, 2020


In the age of data-driven problem-solving, the ability to apply cutting edge computational tools for explaining substantive phenomena in a digestible way to a wide audience is an increasingly valuable skill. Such skills are no less important in political and social research. Yet, application of quantitative methods often assumes an understanding of the data, structure, patterns, and concepts that directly influence the broader research program. It is often the case that researchers may not be entirely aware of the precise structure and nature of their data or what to expect of their data when approaching analysis. Further, in teaching social science research methods, it is often overlooked that the process of exploring data is a key stage in applied research, which precedes predictive modeling and hypothesis testing. These tasks, though, require knowledge of appropriate methods for exploring and understanding data in the service of discerning patterns, which contribute to development of theories and testable expectations. This Element seeks to fill this gap by offering researchers and instructors an introduction clustering, which is a prominent class of unsupervised machine learning for exploring, mining, and understanding data. I detail several widely used clustering techniques, and pair each with R code and real data to facilitate interaction with the concepts. Three unsupervised clustering algorithms are introduced: agglomerative hierarchical clustering, k-means clustering, and Gaussian mixture models. I conclude by offering a high level look at three advanced methods: fuzzy C-means, DBSCAN, and partitioning around medoids clustering. The goal is to bring applied researchers into the world of unsupervised machine learning, both theoretically as well as practically. All code can be interactively run on the cloud computing platform Code Ocean to guide readers through implementation of the algorithms and techniques.

Keywords: machine learning, unsupervised learning, clustering, political science, social science, EDA

Suggested Citation

Waggoner, Philip, Unsupervised Machine Learning for Clustering in Political and Social Research (September 1, 2020). New York, NY: Cambridge University Press (Forthcoming), Available at SSRN: https://ssrn.com/abstract=3693395

Philip Waggoner (Contact Author)

Columbia University, ISERP ( email )

3022 Broadway
New York, NY 10027
United States

HOME PAGE: http://pdwaggoner.github.io/

YouGov America ( email )

432 Park Avenue South, Floor 5
New York, NY 10016
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics