Unsupervised Machine Learning for Clustering in Political and Social Research
New York, NY: Cambridge University Press (Forthcoming)
62 Pages Posted: 4 Nov 2020
Date Written: September 1, 2020
In the age of data-driven problem-solving, the ability to apply cutting edge computational tools for explaining substantive phenomena in a digestible way to a wide audience is an increasingly valuable skill. Such skills are no less important in political and social research. Yet, application of quantitative methods often assumes an understanding of the data, structure, patterns, and concepts that directly influence the broader research program. It is often the case that researchers may not be entirely aware of the precise structure and nature of their data or what to expect of their data when approaching analysis. Further, in teaching social science research methods, it is often overlooked that the process of exploring data is a key stage in applied research, which precedes predictive modeling and hypothesis testing. These tasks, though, require knowledge of appropriate methods for exploring and understanding data in the service of discerning patterns, which contribute to development of theories and testable expectations. This Element seeks to fill this gap by offering researchers and instructors an introduction clustering, which is a prominent class of unsupervised machine learning for exploring, mining, and understanding data. I detail several widely used clustering techniques, and pair each with R code and real data to facilitate interaction with the concepts. Three unsupervised clustering algorithms are introduced: agglomerative hierarchical clustering, k-means clustering, and Gaussian mixture models. I conclude by offering a high level look at three advanced methods: fuzzy C-means, DBSCAN, and partitioning around medoids clustering. The goal is to bring applied researchers into the world of unsupervised machine learning, both theoretically as well as practically. All code can be interactively run on the cloud computing platform Code Ocean to guide readers through implementation of the algorithms and techniques.
Keywords: machine learning, unsupervised learning, clustering, political science, social science, EDA
Suggested Citation: Suggested Citation