Fairer Machine Learning in the Real World: Mitigating Discrimination Without Collecting Sensitive Data

Big Data & Society 4(2), doi:10.1177/2053951717743530

17 Pages Posted: 30 Oct 2017 Last revised: 10 Dec 2017

See all articles by Michael Veale

Michael Veale

University College London, Faculty of Laws; Alan Turing Institute - Alan Turing Institute

Reuben Binns

University of Oxford

Date Written: October 27, 2017

Abstract

Decisions based on algorithmic, machine learning models can be unfair, reproducing biases in historical data used to train them. While computational techniques are emerging to address aspects of these concerns through communities such as discrimination-aware data mining (DADM) and fair, accountable and transparent machine learning (FATML), their practical implementation faces real-world challenges. For legal, institutional or commercial reasons, organisations might not hold the data on sensitive attributes such as gender, ethnicity, sexuality or disability needed to diagnose and mitigate emergent indirect discrimination-by-proxy, such as redlining. Such organisations might also lack the knowledge and capacity to identify and manage fairness issues that are emergent properties of complex sociotechnical systems.

This paper presents and discusses three potential approaches to deal with such knowledge and information deficits in the context of fairer machine learning. Trusted third parties could selectively store data necessary for performing discrimination discovery and incorporating fairness constraints into model-building in a privacy-preserving manner. Collaborative online platforms would allow diverse organisations to record, share and access contextual and experiential knowledge to promote fairness in machine learning systems. Finally, unsupervised learning and pedagogically interpretable algorithms might allow fairness hypotheses to be built for further selective testing and exploration.

Real-world fairness challenges in machine learning are not abstract, constrained optimisation problems, but are institutionally and contextually grounded. Computational fairness tools are useful, but must be researched and developed in and with the messy contexts that will shape their deployment, rather than just for imagined situations. Not doing so risks real, near-term algorithmic harm.

Keywords: machine learning, algorithmic accountability, discrimination-aware data mining, fairness-aware machine learning, protected characteristics, anti-discrimination law, FATML, algorithmic regulation, algorithmic bias, data protection, gdpr, discrimination, redlining

Suggested Citation

Veale, Michael and Binns, Reuben, Fairer Machine Learning in the Real World: Mitigating Discrimination Without Collecting Sensitive Data (October 27, 2017). Big Data & Society 4(2), doi:10.1177/2053951717743530. Available at SSRN: https://ssrn.com/abstract=3060763

Michael Veale (Contact Author)

University College London, Faculty of Laws ( email )

Gower St
London WC1E OEG, WC1E 6BT
United Kingdom

Alan Turing Institute - Alan Turing Institute ( email )

96 Euston Road
London, NW1 2DB
United Kingdom

Reuben Binns

University of Oxford ( email )

Mansfield Road
Oxford, Oxfordshire OX1 4AU
United Kingdom

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
258
Abstract Views
1,183
rank
122,987
PlumX Metrics