A Supervised Machine Learning Procedure to Detect Electoral Fraud Using Digital Analysis
42 Pages Posted: 23 Apr 2010
Date Written: April 22, 2010
Abstract
This paper introduces a naive Bayes classifier to detect electoral fraud using digit patterns in vote counts with authentic and synthetic data. The procedure is the following: (1) we create 10,000 simulated electoral contests between two parties using Monte Carlo methods. This training set is composed of two disjoint subsets: one containing electoral returns that follow a Benford distribution, and another where the vote counts are purposively "manipulated" by electoral tampering – a percentage of votes are taken away from one party and given to the other; (2) we calibrate membership values of the simulated elections (i.e. clean or fraudulent) using logistic regression; (3) we recover class-conditional densities using the relative frequencies from the training set; (4) we apply Bayes' rule to class-conditional probabilities and class priors to establish the membership probabilities of authentic observations. To illustrate our technique, we examine elections in the province of Buenos Aires (Argentina) between 1932 and 1942, a period with a checkered history of fraud. Our analysis allows us to successfully classify electoral contests according to their degree of fraud. More generally, our findings indicate that Benford's Law is an effective tool for identifying fraud, even when minimal information (i.e. electoral returns) is available.
Keywords: Electoral Fraud, Benford Law, Monte Carlo, Synthetic Data, Bayesian Analysis, Argentina
JEL Classification: C11, C15, C45, N46
Suggested Citation: Suggested Citation
Do you have a job opening that you would like to promote on SSRN?
Recommended Papers
-
Electoral Fraud, the Rise of Peron and Demise of Checks and Balances in Argentina
By Andres A. Gallo and Lee J. Alston
-
Electoral Fraud, the Rise of Peron and Demise of Checks and Balances in Argentina
By Lee J. Alston and Andres A. Gallo