Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator

Quiroz, Matias; Villani, Mattias; Kohn, Robert

doi:10.2139/ssrn.2706410

Download This Paper

Open PDF in Browser

Add Paper to My Library

Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator

Sveriges Riksbank Working Paper Series No. 306

Riksbank Research Paper Series No. 130

32 Pages Posted: 4 Jun 2016

See all articles by Matias Quiroz

Matias Quiroz

Sveriges Riksbank - Research Division; Stockholm University - Department of Statistics

Robert Kohn

University of New South Wales - School of Economics and School of Banking and Finance

Date Written: August 1, 2015

Abstract

We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly effcient difference estimator from the survey sampling literature to estimate the log-likelihood accurately using only a small fraction of the data. Our algorithm improves on the O(n) complexity of regular MCMC by operating over local data clusters instead of the full sample when computing the likelihood. The likelihood estimate is used in a Pseudo-marginal framework to sample from a perturbed posterior which is within O(m^-1/2) of the true posterior, where m is the subsample size. The method is applied to a logistic regression model to predict firm bankruptcy for a large data set. We document a significant speed up in comparison to the standard MCMC on the full dataset.

Keywords: Bayesian inference, Markov Chain Monte Carlo, Pseudo-marginal MCMC, estimated likelihood, GLM for large data

JEL Classification: C11, C13, C15, C55, C83

Suggested Citation: Suggested Citation

Quiroz, Matias and Villani, Mattias and Kohn, Robert, Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator (August 1, 2015). Sveriges Riksbank Working Paper Series No. 306, Riksbank Research Paper Series No. 130, Available at SSRN: https://ssrn.com/abstract=2706410 or http://dx.doi.org/10.2139/ssrn.2706410