Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

40 Pages Posted: 7 Jul 2023

See all articles by Vincent Jeanselme

Vincent Jeanselme

University of Cambridge - MRC Biostatistics Unit; The Alan Turing Institute

Maria De-Arteaga

University of Texas at Austin - Department of Information, Risk, and Operations Management

Zhe Zhang

University of California, San Diego (UCSD) - Rady School of Management

Brian D. M. Tom

University of Cambridge - MRC Biostatistics Unit

Jessica Barrett

University of Cambridge - MRC Biostatistics Unit

Date Written: June 30, 2023

Abstract

Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns.

Then, we theoretically demonstrate that the optimal choice between two common imputation strategies is under-determined, both in terms of group-specific imputation quality and of the gap in quality across groups. Particularly, the use of group-specific imputation strategies may counter-intuitively reduce data quality for marginalised group.

We complement these theoretical results with simulations and real-world empirical evidence showing that imputation choices influence group-specific data quality and downstream algorithmic fairness, and that no imputation strategy consistently reduces group disparities in reconstruction error or predictions. Importantly, our results show that current practices may be detrimental to health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from an overlooked step of the machine learning pipeline.

Note:
Funding Information: Vincent Jeanselme acknowledges the support of The Alan Turing Institute’s Enrichment Scheme and the partial support of the UKRI Medical Research Council (MC UU 00002/5 and MC UU 00002/2). Maria De-Arteaga acknowledges the support of NIH through grant R01NS124642.

Conflict of Interests: The authors declare that they have no conflict of interest.

Keywords: Algorithmic fairness, Healthcare, Imputation, Missingness

Suggested Citation

Jeanselme, Vincent and De-Arteaga, Maria and Zhang, Zhe and Tom, Brian D. M. and Barrett, Jessica, Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness (June 30, 2023). Available at SSRN: https://ssrn.com/abstract=4496874 or http://dx.doi.org/10.2139/ssrn.4496874

Vincent Jeanselme (Contact Author)

University of Cambridge - MRC Biostatistics Unit ( email )

Cambridge
United Kingdom

The Alan Turing Institute ( email )

British Library, 96 Euston Road
96 Euston Road
London, NW12DB
United Kingdom

Maria De-Arteaga

University of Texas at Austin - Department of Information, Risk, and Operations Management ( email )

United States

Zhe Zhang

University of California, San Diego (UCSD) - Rady School of Management ( email )

9500 Gilman Drive
Rady School of Management
La Jolla, CA 92093
United States

Brian D. M. Tom

University of Cambridge - MRC Biostatistics Unit ( email )

Cambridge, CB2 OSR
United Kingdom

Jessica Barrett

University of Cambridge - MRC Biostatistics Unit

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
58
Abstract Views
266
Rank
692,835
PlumX Metrics