Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator

Sveriges Riksbank Working Paper Series No. 306

Riksbank Research Paper Series No. 130

32 Pages Posted: 4 Jun 2016

See all articles by Matias Quiroz

Matias Quiroz

Sveriges Riksbank - Research Division; Stockholm University - Department of Statistics

Mattias Villani

Linkoping University

Robert Kohn

University of New South Wales - School of Economics and School of Banking and Finance

Date Written: August 1, 2015

Abstract

We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly effcient difference estimator from the survey sampling literature to estimate the log-likelihood accurately using only a small fraction of the data. Our algorithm improves on the O(n) complexity of regular MCMC by operating over local data clusters instead of the full sample when computing the likelihood. The likelihood estimate is used in a Pseudo-marginal framework to sample from a perturbed posterior which is within O(m -1/2) of the true posterior, where m is the subsample size. The method is applied to a logistic regression model to predict firm bankruptcy for a large data set. We document a significant speed up in comparison to the standard MCMC on the full dataset.

Keywords: Bayesian inference, Markov Chain Monte Carlo, Pseudo-marginal MCMC, estimated likelihood, GLM for large data

JEL Classification: C11, C13, C15, C55, C83

Suggested Citation

Quiroz, Matias and Villani, Mattias and Kohn, Robert, Scalable MCMC for Large Data Problems Using Data Subsampling and the Difference Estimator (August 1, 2015). Sveriges Riksbank Working Paper Series No. 306, Riksbank Research Paper Series No. 130, Available at SSRN: https://ssrn.com/abstract=2706410 or http://dx.doi.org/10.2139/ssrn.2706410

Matias Quiroz (Contact Author)

Sveriges Riksbank - Research Division ( email )

S-103 37 Stockholm
Sweden

Stockholm University - Department of Statistics ( email )

Stockholm, SE-106 91
Sweden

Mattias Villani

Linkoping University ( email )

Överstegatan 30
Linkoping, 581 83
Sweden

Robert Kohn

University of New South Wales - School of Economics and School of Banking and Finance ( email )

Australian School of Business
Sydney NSW 2052, ACT 2600
Australia
+61 2 9385 2150 (Phone)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
84
Abstract Views
648
rank
422,890
PlumX Metrics