Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases

Posted: 26 Aug 2022

See all articles by Su Golder

Su Golder

University of York

Karen O’Connor

University of Pennsylvania - Department of Biostatistics, Epidemiology, and Informatics

Yunwen Wang

University of California, Los Angeles (UCLA)

Robin Stevens

University of California, Los Angeles (UCLA)

Graciela Gonzalez Hernandez

University of Pennsylvania - Department of Biostatistics, Epidemiology, and Informatics

Date Written: August 1, 2022

Abstract

A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (a) “women,” “men,” or “sex”; (b) “big data,” “artificial intelligence,” or “NLP”; and (c) “disparities” or “differences.” From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.

Suggested Citation

Golder, Su and O’Connor, Karen and Wang, Yunwen and Stevens, Robin and Gonzalez Hernandez, Graciela, Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases (August 1, 2022). Annual Review of Biomedical Data Science, Vol. 5, pp. 251-267, 2022, Available at SSRN: https://ssrn.com/abstract=4200112 or http://dx.doi.org/10.1146/annurev-biodatasci-122120-025806

Su Golder (Contact Author)

University of York ( email )

Thessaloniki
Greece

Karen O’Connor

University of Pennsylvania - Department of Biostatistics, Epidemiology, and Informatics

Yunwen Wang

University of California, Los Angeles (UCLA)

405 Hilgard Avenue
Box 951361
Los Angeles, CA 90095
United States

Robin Stevens

University of California, Los Angeles (UCLA)

405 Hilgard Avenue
Box 951361
Los Angeles, CA 90095
United States

Graciela Gonzalez Hernandez

University of Pennsylvania - Department of Biostatistics, Epidemiology, and Informatics

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
93
PlumX Metrics