Machine Learning with Statistical Imputation for Predicting Drug Approvals

60 Pages Posted: 25 May 2017 Last revised: 21 May 2019

See all articles by Andrew W. Lo

Andrew W. Lo

Massachusetts Institute of Technology (MIT) - Sloan School of Management; National Bureau of Economic Research (NBER); Massachusetts Institute of Technology (MIT) - Computer Science and Artificial Intelligence Laboratory (CSAIL)

Kien Wei Siah

Massachusetts Institute of Technology (MIT)

Chi Heem Wong

Massachusetts Institute of Technology (MIT) - Computer Science and Artificial Intelligence Laboratory (CSAIL); Massachusetts Institute of Technology (MIT); MIT Sloan School of Management

Date Written: October 1, 2018

Abstract

We apply machine-learning techniques to predict drug approvals using drug-development and clinical-trial data from 2003 to 2015 involving several thousand drug-indication pairs with over 140 features across 15 disease groups. To deal with missing data, we use imputation methods that allow us to fully exploit the entire dataset, the largest of its kind. We show that our approach outperforms complete-case analysis, which typically yields biased inferences. We achieve predictive measures of 0.78, and 0.81 AUC (“area under the receiver operating characteristic curve,” the estimated probability that a classifier will rank a positive outcome higher than a negative outcome) for predicting transitions from phase 2 to approval and phase 3 to approval, respectively. Using five-year rolling windows, we document an increasing trend in the predictive power of these models, a consequence of improving data quality and quantity. The most important features for predicting success are trial outcomes, trial status, trial accrual rates, duration, prior approval for another indication, and sponsor track records. We provide estimates of the probability of success for all drugs in the current pipeline.

Keywords: biotech; pharmaceuticals; risk management; machine learning

JEL Classification: I10, I11, G11, G32, C55, C13

Suggested Citation

Lo, Andrew W. and Siah, Kien Wei and Wong, Chi Heem, Machine Learning with Statistical Imputation for Predicting Drug Approvals (October 1, 2018). Available at SSRN: https://ssrn.com/abstract=2973611 or http://dx.doi.org/10.2139/ssrn.2973611

Andrew W. Lo (Contact Author)

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-618
Cambridge, MA 02142
United States
617-253-0920 (Phone)
781 891-9783 (Fax)

HOME PAGE: http://web.mit.edu/alo/www

National Bureau of Economic Research (NBER) ( email )

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Massachusetts Institute of Technology (MIT) - Computer Science and Artificial Intelligence Laboratory (CSAIL)

Stata Center
Cambridge, MA 02142
United States

Kien Wei Siah

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

Chi Heem Wong

Massachusetts Institute of Technology (MIT) - Computer Science and Artificial Intelligence Laboratory (CSAIL) ( email )

Stata Center
Cambridge, MA 02142
United States

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

MIT Sloan School of Management ( email )

100 Main Street
Cambridge, MA 02142
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
1,030
Abstract Views
3,780
rank
20,692
PlumX Metrics