A Supervised Machine Learning Procedure to Detect Electoral Fraud Using Digital Analysis

42 Pages Posted: 23 Apr 2010

See all articles by Francisco Cantu

Francisco Cantu

University of California, San Diego - Department of Political Science

Sebastian M. Saiegh

University of California, San Diego (UCSD) - Department of Political Science

Date Written: April 22, 2010

Abstract

This paper introduces a naive Bayes classifier to detect electoral fraud using digit patterns in vote counts with authentic and synthetic data. The procedure is the following: (1) we create 10,000 simulated electoral contests between two parties using Monte Carlo methods. This training set is composed of two disjoint subsets: one containing electoral returns that follow a Benford distribution, and another where the vote counts are purposively "manipulated" by electoral tampering – a percentage of votes are taken away from one party and given to the other; (2) we calibrate membership values of the simulated elections (i.e. clean or fraudulent) using logistic regression; (3) we recover class-conditional densities using the relative frequencies from the training set; (4) we apply Bayes' rule to class-conditional probabilities and class priors to establish the membership probabilities of authentic observations. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1932 and 1942, a period with a checkered history of fraud. Our analysis allows us to successfully classify electoral contests according to their degree of fraud. More generally, our findings indicate that Benford's Law is an effective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.

Keywords: Electoral Fraud, Benford Law, Monte Carlo, Synthetic Data, Bayesian Analysis, Argentina

JEL Classification: C11, C15, C45, N46

Suggested Citation

Cantu, Francisco and Saiegh, Sebastian M., A Supervised Machine Learning Procedure to Detect Electoral Fraud Using Digital Analysis (April 22, 2010). Available at SSRN: https://ssrn.com/abstract=1594406 or http://dx.doi.org/10.2139/ssrn.1594406

Francisco Cantu

University of California, San Diego - Department of Political Science ( email )

9500 Gilman Drive
Code 0521
La Jolla, CA 92093-0521
United States

Sebastian M. Saiegh (Contact Author)

University of California, San Diego (UCSD) - Department of Political Science ( email )

9500 Gilman Drive
Code 0521
La Jolla, CA 92093-0521
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
246
Abstract Views
2,450
Rank
255,811
PlumX Metrics