Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness
40 Pages Posted: 7 Jul 2023
Date Written: June 30, 2023
Abstract
Machine learning risks reinforcing biases present in data, and, as we argue in this work, in what is absent from data. In healthcare, biases have marked medical history, leading to unequal care affecting marginalised groups. Patterns in missing data often reflect these group discrepancies, but the algorithmic fairness implications of group-specific missingness are not well understood. Despite its potential impact, imputation is often an overlooked preprocessing step, with attention placed on the reduction of reconstruction error and overall performance, ignoring how imputation can affect groups differently. Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions. First, we provide a structured view of the relationship between clinical presence mechanisms and group-specific missingness patterns.
Then, we theoretically demonstrate that the optimal choice between two common imputation strategies is under-determined, both in terms of group-specific imputation quality and of the gap in quality across groups. Particularly, the use of group-specific imputation strategies may counter-intuitively reduce data quality for marginalised group.
We complement these theoretical results with simulations and real-world empirical evidence showing that imputation choices influence group-specific data quality and downstream algorithmic fairness, and that no imputation strategy consistently reduces group disparities in reconstruction error or predictions. Importantly, our results show that current practices may be detrimental to health equity as similarly performing imputation strategies at the population level can affect marginalised groups differently. Finally, we propose recommendations for mitigating inequities that may stem from an overlooked step of the machine learning pipeline.
Note:
Funding Information: Vincent Jeanselme acknowledges the support of The Alan Turing Institute’s Enrichment Scheme and the partial support of the UKRI Medical Research Council (MC UU 00002/5 and MC UU 00002/2). Maria De-Arteaga acknowledges the support of NIH through grant R01NS124642.
Conflict of Interests: The authors declare that they have no conflict of interest.
Keywords: Algorithmic fairness, Healthcare, Imputation, Missingness
Suggested Citation: Suggested Citation