Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact preprints@lancet.com.
Developing a Pre-Testing Diagnostic Tool for COVID-19 Using Big Data Predictive Analytics
28 Pages Posted: 29 Sep 2020
More...Abstract
Background: Standard Polymerase Chain reaction (PCR) tests for SARS-COV-2 are in short supply to meet demand in many countries presenting a need to improve testing efficiency. Pre-testing tools can be used to ensure continued public safety as systems move through the pandemic. In this study we set out to create an instrument based on big data predictive tools to assess pre-test probability for COVID-19.
Methods: We analyzed data reported by the Israeli Ministry of Health (IMOH) for standard PCR tests done for SARS-COV-2 from March to April, 2020, overall 108,852 cases. Demographics and symptoms of the patients were collected at time of testing. Four supervised machine learning algorithms were used to analyze 20,537 test results of cases who presented with symptoms. Model results were used to develop efficient pre-test diagnostic tool.
Findings: Of symptomatic patients tested, 6,427 (31.3%) tested positive for SARS-COV-2, and 14,110 (68.7%) tested negative. In all models used headache, shortness of breath, sore throat, fever, and having contact with an infected person came up as most predictive of a positive test. The area under the curve of the receiver operating characteristic curve for the test sample was found to be 0.88 and the misclassification rate was between 4.7% and 6.5% for all predictive models, demonstrating effective classification ability. Using our pre-test probability screening tool with conventional PCR testing can potentially increase efficiency by 141%.
Interpretation: We suggest a simple diagnostic pre-test tool for assessing the probability of infection can increase efficiency of testing and effectiveness of public health COVID-19 programs.
Funding: None
Declaration of Interests: The authors declare no competing interests.
Ethics Approval Statement: The authors noted that the analysis presented was approved by the IMOH Data Sharing Institutional Review Board.
Keywords: COVID-19; SARS-COV-2; PCR tests; Machine Learning; Predictive Analytics; PreTesting; Big Data
Suggested Citation: Suggested Citation