Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact firstname.lastname@example.org.
The Benefit of Augmenting Open Data with Clinical Data-Warehouse EHR for Forecasting SARS-CoV-2 Hospitalizations in Bordeaux Area, France
25 Pages Posted: 31 Mar 2022More...
Background: The ability to anticipate SARS-CoV-2 pandemic evolution and especially the number of hospitalizations in a short-time interval, is critical to better organize health care system. Several forecast models have been proposed relying on public data sources. In this work, we hypothesized that forecasts should be improved by the enrichment of the data from hospital data-warehouse including ambulance service and emergency units reports. The objective was to predict the number of hospitalized patients over one or two weeks in one of the main regional hospital in Southwestern France.
Methods: Aggregated data from SARS-CoV-2 and weather public database and data-warehouse of the Bordeaux hospital were extracted from 2020-05-16 to 2022-01-17. The outcomes were the number of hospitalized patients in the Bordeaux Hospital at 7 and 14 days. We compared the performance of different data sources, feature engineering and machine learning models including elastic-net penalized regressions, random forest and Fréchet random forest.
Findings: During the period of 88 weeks, 2561 hospitalizations due to COVID19 were recorded at the Bordeaux Hospital. The model achieving the best performance was an elastic-net penalized linear regression using all available data with a median absolute error (MAE) at 7 and 14 days of 6·41 [6·07 ; 6·81] and 10·11 [9·54 ; 10·65] hospitalizations, respectively. Electronic health records from the hospital data-warehouse improved median absolute error at 7 and 14 days by around 17%. Graphical evaluation showed remaining forecast error was mainly due to delay in slope shift detection.
Interpretation: Forecast model showed overall good performance both at 7 and 14 days which were improved by the addition of the data from Bordeaux Hospital data-warehouse. However, the shift of the dynamic during each infection wave remained difficult to predict.
Funding: This work has been partly supported by Inria, Mission COVID19, GESTEPID project and Nouvelle Aquitaine regional funding (Prediction territorial COVID N°1333140).
Declaration of Interest: We declare no competing interests.
Ethical Approval: No ethics committee approval was needed for this work as data used for modelling were aggregated.
Keywords: SARS-CoV-2, forecasting, electronic health records, Data Warehouse, machine learning
Suggested Citation: Suggested Citation