
Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact preprints@lancet.com.
Automated Machine Learning Optimizes and Accelerates COVID-19 Predictive Modeling
33 Pages Posted: 23 Apr 2021
More...Abstract
The rapid outbreak of COVID-19 brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Despite the global scientific effort, there is lack of efficient predictive models for patient stratification and successful management of the disease.
Here, we employed Automated Machine Learning (AutoML) to analyze 3 publicly available COVID-19 datasets, including serum proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed.
Analysis of a combined proteomic and metabolomic dataset produced ten equivalent signatures of two features each, with AUC 0.840(CI 0.723 – 0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each with AUC 0.914(CI 0.865 - 0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. A second transcriptomic dataset led to two equivalent signatures of nine features each with AUC 0.967(CI 0.899 - 0.996) in identifying COVID-19 patients from virus-free individuals. Multiple new features emerged implicated in a wide range of pathways including viral mRNA translation pathways, interferon gamma signaling and Innate Immune System.
In conclusion, by application of AutoML multiple biosignatures were built in a fast automated way, presenting reduced feature number and high predictive performance that remained high upon validation. These favorable characteristics are eminent for further development of cost-effective clinical assays to contribute to better disease management. Our results also highlight the importance of revisiting precious and well-built datasets for maximal conclusion extraction from a given experimental observation.
Funding Statement: No funding was received for this research.
Declaration of Interests: GP, MK, and NT are employees of Gnosis Data Analysis that offers the JADBio service commercially. IT and VL are co-founders of Gnosis Data Analysis that offers the JADBio service commercially and members of its scientific advisory board.
Keywords: COVID-19, automated Machine Learning, SARS-CoV-2, modeling, predictive models, validation
Suggested Citation: Suggested Citation