Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines

27 Pages Posted: 24 Feb 2020

See all articles by Alexei Kondratyev

Alexei Kondratyev

Standard Chartered Bank

Christian Schwarz

affiliation not provided to SSRN

Blanka Horvath

ETH Zürich - Department of Mathematics

Date Written: January 27, 2020

Abstract

We propose a novel approach to the anonymisation of datasets through non-parametric learning of the underlying multivariate distribution of dataset features and generation of the new synthetic samples from the learned distribution. The main objective is to ensure equal (or better) performance of the classifiers and regressors trained on synthetic datasets in comparison with the same classifiers and regressors trained on the original data. The ability to generate unlimited number of synthetic data samples from the learned distribution can be a remedy in fighting overtting when dealing with small original datasets. When the synthetic data generator is trained as an autoencoder with the bottleneck information compression structure we can also expect to see a reduced number of outliers in the generated datasets, thus further improving the generalization capabilities of the classifiers trained on synthetic data. We achieve these objectives with the help of the Restricted Boltzmann Machine, a special type of generative neural network that possesses all the required properties of a powerful data anonymiser.

Keywords: Restricted Boltzmann Machine, non-parametric sampling, synthetic data generation, data anonymisation, detection of outliers, reduction of overfitting

JEL Classification: C63, G17

Suggested Citation

Kondratyev, Alexei and Schwarz, Christian and Horvath, Blanka, Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines (January 27, 2020). Available at SSRN: https://ssrn.com/abstract=3526436 or http://dx.doi.org/10.2139/ssrn.3526436

Alexei Kondratyev (Contact Author)

Standard Chartered Bank ( email )

1 Basinghall Avenue
London, EC2V 5DD
United Kingdom

Christian Schwarz

affiliation not provided to SSRN

Blanka Horvath

ETH Zürich - Department of Mathematics ( email )

R¨amistrasse 101
Raemistr. 101
Z¨urich, 8092
Switzerland

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
259
Abstract Views
1,092
rank
132,385
PlumX Metrics