Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines

27 Pages Posted: 24 Feb 2020

See all articles by Alexei Kondratyev

Alexei Kondratyev

Abu Dhabi Investment Authority

Christian Schwarz

affiliation not provided to SSRN

Blanka Horvath

ETH Zürich - Department of Mathematics

Date Written: January 27, 2020

Abstract

We propose a novel approach to the anonymisation of datasets through non-parametric learning of the underlying multivariate distribution of dataset features and generation of the new synthetic samples from the learned distribution. The main objective is to ensure equal (or better) performance of the classifiers and regressors trained on synthetic datasets in comparison with the same classifiers and regressors trained on the original data. The ability to generate unlimited number of synthetic data samples from the learned distribution can be a remedy in fighting overtting when dealing with small original datasets. When the synthetic data generator is trained as an autoencoder with the bottleneck information compression structure we can also expect to see a reduced number of outliers in the generated datasets, thus further improving the generalization capabilities of the classifiers trained on synthetic data. We achieve these objectives with the help of the Restricted Boltzmann Machine, a special type of generative neural network that possesses all the required properties of a powerful data anonymiser.

Keywords: Restricted Boltzmann Machine, non-parametric sampling, synthetic data generation, data anonymisation, detection of outliers, reduction of overfitting

JEL Classification: C63, G17

Suggested Citation

Kondratyev, Alexei and Schwarz, Christian and Horvath, Blanka, Data Anonymisation, Outlier Detection and Fighting Overfitting with Restricted Boltzmann Machines (January 27, 2020). Available at SSRN: https://ssrn.com/abstract=3526436 or http://dx.doi.org/10.2139/ssrn.3526436

Alexei Kondratyev (Contact Author)

Abu Dhabi Investment Authority ( email )

Abu Dhabi
United Arab Emirates

Christian Schwarz

affiliation not provided to SSRN

Blanka Horvath

ETH Zürich - Department of Mathematics ( email )

R¨amistrasse 101
Raemistr. 101
Z¨urich, 8092
Switzerland

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
478
Abstract Views
1,817
rank
76,684
PlumX Metrics