lancet-header

Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact preprints@lancet.com.

Fusing a Bayesian Case Velocity Model with Random Forest for Predicting COVID-19 in the U.S.

51 Pages Posted: 11 Jun 2020

See all articles by Gregory L. Watson

Gregory L. Watson

University of California, Los Angeles (UCLA) - Department of Biostatistics

Di Xiong

University of California, Los Angeles (UCLA) - Department of Biostatistics

Lu Zhang

University of California, Los Angeles (UCLA) - Department of Biostatistics

Joseph A. Zoller

University of California, Los Angeles (UCLA) - Department of Biostatistics

John Shamshoian

University of California, Los Angeles (UCLA) - Department of Biostatistics

Phillip Sundin

University of California, Los Angeles (UCLA) - Department of Biostatistics

Teresa Bufford

University of California, Los Angeles (UCLA) - Department of Biostatistics

Anne W. Rimoin

University of California, Los Angeles (UCLA) - Department of Epidemiology

Marc A. Suchard

University of California, Los Angeles (UCLA) - David Geffen School of Medicine

Christina M. Ramirez

University of California, Los Angeles (UCLA) - Department of Biostatistics

More...

Abstract

Background: Predictions of COVID-19 case growth and mortality are critical to the decisions of political leaders, businesses, and individuals grappling with the pandemic. This predictive task is challenging due to the novelty of the virus, limited data, and dynamic political and societal responses.

Methods: We embed a Bayesian nonlinear mixed model and a random forest algorithm within an epidemiological compartmental model for empirically grounded COVID-19 predictions. The Bayesian case model fits a location-specific curve to the velocity (first derivative) of the transformed cumulative case count, borrowing strength across geographic locations and incorporating prior information to obtain a posterior distribution for case trajectory. The compartmental model uses this distribution and predicts deaths using a random forest algorithm trained on COVID-19 data and population-level characteristics, yielding daily projections and interval estimates for infections and deaths in U.S. states. We evaluate forecasting accuracy on a two-week holdout set.

Findings: The model predicts COVID-19 cases and deaths well, with a mean absolute scaled error of 0.40 for cases and 0.32 for deaths throughout the two-week evaluation period. The substantial variation in predicted trajectories and associated uncertainty between states is illustrated by comparing three unique locations: New York, Ohio, and Mississippi.

Interpretation: The sophistication and accuracy of this COVID-19 model offer reliable predictions and uncertainty estimates for the current trajectory of the pandemic in the U.S. and provide a platform for future predictions as shifting political and societal responses alter its course.

Funding Statement: Research partially supported by unrestricted grants from Private Health Management.

Declaration of Interests: GLW, JAZ, PS, TB, and MAS report personal fees from Private Health Management during the conduct of the study. CMR reports grants and personal fees from Private Health Management. MAS also reports grants from US National Institutes of Health, grants from IQVIA, and personal fees from Janssen Research and Development. All other authors declare no competing interests.

Keywords: COVID-19; coronavirus; mathematical model; Bayesian mixed model; machine learning; random forests

Suggested Citation

Watson, Gregory L. and Xiong, Di and Zhang, Lu and Zoller, Joseph A. and Shamshoian, John and Sundin, Phillip and Bufford, Teresa and Rimoin, Anne W. and Suchard, Marc A. and Ramirez, Christina M., Fusing a Bayesian Case Velocity Model with Random Forest for Predicting COVID-19 in the U.S. (5/3/2020). Available at SSRN: https://ssrn.com/abstract=3594606 or http://dx.doi.org/10.2139/ssrn.3594606

Gregory L. Watson (Contact Author)

University of California, Los Angeles (UCLA) - Department of Biostatistics ( email )

Los Angeles, CA
United States

Di Xiong

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Lu Zhang

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Joseph A. Zoller

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

John Shamshoian

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Phillip Sundin

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Teresa Bufford

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Anne W. Rimoin

University of California, Los Angeles (UCLA) - Department of Epidemiology ( email )

Los Angeles, CA 90095-1772
United States

Marc A. Suchard

University of California, Los Angeles (UCLA) - David Geffen School of Medicine ( email )

1000 Veteran Avenue, Box 956939
Los Angeles, CA 90095-6939
United States

Christina M. Ramirez

University of California, Los Angeles (UCLA) - Department of Biostatistics

Los Angeles, CA
United States

Click here to go to TheLancet.com

Paper statistics

Abstract Views
1,153
Downloads
61
PlumX Metrics