Human Forest vs. Random Forest in Time-Sensitive COVID-19 Clinical Trial Prediction

24 Pages Posted: 10 Dec 2021 Last revised: 27 Jul 2022

See all articles by Pavel D. Atanasov

Pavel D. Atanasov

IE University; Pytho LLC

Regina Joseph

Pytho LLC

Felipe Feijoo

Pontifical Catholic University of Valparaiso

Max Marshall

Johns Hopkins University

Amanda Conway

American University

Sauleh Siddiqui

American University

Date Written: July 27, 2022

Abstract

How do we combine historical data and human insights to predict complex outcomes, such as the timely advancement of clinical trials? We report the methods and results of the first study comparing the new Human Forest (HF) method with a control crowdsourcing method and a machine model, a time-specific random survival forest (RSF) model. We provide the first description of the Human Forest method, which enables forecasters to define custom reference classes, query a historical database and review base rates specific to their selections. These base rates, and adjusted probabilistic estimates, are then aggregated. Forecasters receive proper scoring feedback and accuracy incentives. HF works in tandem with a new algorithm, Most Popular Selections, which provides a collective intelligence approach for addressing the long-standing reference class problem, by crowdsourcing and aggregating reference class selections. The empirical validation spans two 6-month tournaments that focus on trial phase transition for vaccines and treatments for COVID-19 and other infectious diseases. The tournaments include 60 forecasting questions. Results show that HF significantly outperforms the RSF model, registering mean Brier scores between 36% and 52% lower than those earned by the RSF model. HF and Control Polls exhibit approximately equivalent performance. Including Human Forest- derived base rate estimates at the aggregation stage improves overall performance. MPS-generated base-rate estimates exhibit performance between that of RSF and human crowdsourcing methods. Our results show that human forecaster crowds with appropriate elicitation and aggregation tools can outperform statistical models. Interactive access to data through HF appears either beneficial or neutral to forecasting performance, even in a setting where new developments deviate from historical patterns.

Keywords: Forecasting, Crowdsourcing, Machine Learning, Clinical Development

JEL Classification: C45, I10, C44

Suggested Citation

Atanasov, Pavel D. and Joseph, Regina and Feijoo, Felipe and Marshall, Max and Conway, Amanda and Siddiqui, Sauleh, Human Forest vs. Random Forest in Time-Sensitive COVID-19 Clinical Trial Prediction (July 27, 2022). Available at SSRN: https://ssrn.com/abstract=3981732 or http://dx.doi.org/10.2139/ssrn.3981732

Pavel D. Atanasov (Contact Author)

IE University ( email )

Castellón de la Plana 8
Madrid, 28006
Spain

Pytho LLC ( email )

Madrid
Spain
641179247 (Phone)

HOME PAGE: http://pavelatanasov.net

Regina Joseph

Pytho LLC ( email )

866 President Street
Brooklyn, NY 11215
United States

Felipe Feijoo

Pontifical Catholic University of Valparaiso

Avenida Brasil 2950
Escuela de Comercio - UCV
Valparaiso, Valparaiso 2362736
Chile

Max Marshall

Johns Hopkins University ( email )

Baltimore, MD 20036-1984
United States

Amanda Conway

American University ( email )

4400 Massachusetts Ave, NW
Washington, DC 20016
United States

Sauleh Siddiqui

American University ( email )

4400 Massachusetts Ave, NW
Washington, DC 20016
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
272
Abstract Views
1,782
Rank
229,257
PlumX Metrics