Preprints with The Lancet is part of SSRN´s First Look, a place where journals identify content of interest prior to publication. Authors have opted in at submission to The Lancet family of journals to post their preprints on Preprints with The Lancet. The usual SSRN checks and a Lancet-specific check for appropriateness and transparency have been applied. Preprints available here are not Lancet publications or necessarily under review with a Lancet journal. These preprints are early stage research papers that have not been peer-reviewed. The findings should not be used for clinical or public health decision making and should not be presented to a lay audience without highlighting that they are preliminary and have not been peer-reviewed. For more information on this collaboration, see the comments published in The Lancet about the trial period, and our decision to make this a permanent offering, or visit The Lancet´s FAQ page, and for any feedback please contact email@example.com.
Non-NAT Definite Diagnosis Models of COVID-19 Based on Hematological Features
15 Pages Posted: 14 Dec 2020More...
Background: Given that 2019 novel coronavirus (COVID-19) spreads rapidly, it is critical to make rapid and accurate detection of COVID-19 patients towards containment of SARS-CoV-2 virus. At present, COVID-19 patients are mainly identified through viral nuclear acid testing (NAT). However, factors such as time for patients being tested, experience of test operators, and specimen’s preparation, might affect the accuracy of testing results. The purpose of this study was to use different classification and feature selection methods to improve the diagnostic accuracy of COVID-19 patients.
Methods: We utilized seven machine learning algorithms for assisting diagnosis of COVID-19 by developing a non-NAT algorithm. In order to reduce the number of input features while maintaining the models’ performance so as to decrease the cost and time consumption, we adopted three algorithms, such as Chi-square test, variance analysis, and feature importance tests to identify the optimal feature sets.
Findings: The XGBoost and RF models displayed the best performance for COVID-19 detection, with the highest accuracy rate more than 0·96. The accuracy of RF model was 0·968 when using only ten hematological features and body temperature.
Interpretation: Ten blood features and body temperature can fairly accurately determine whether a suspected patient is infected with COVID-19. Our model can improve the diagnostic accuracy of COVID-19 and reduce the spread.
Funding: This work is supported by grants from the National Key Research and Development Program of China under Grant 2017YFE0123600, the Natural Science Foundation of China (81873931, 81974382 and 81773104), the Frontier Exploration Program of Huazhong University of Science and Technology (2015TS153), and the Major Scientific and Technological Innovation Projects in Hubei Province (2018ACA136).
Declaration of Interests: All the authors stated that the paper had never been published elsewhere, and that there were no competing economic interests.
Ethics Approval Statement: The collection, use, and retrospective analysis of chest CT images, CFs and SARS-CoV-2 nucleic acid PCR results of patients were approved by the institutional ethical committees of HUST-UH (IRB ID:  IEC(A001)).
Keywords: COVID-19; non-NAT; machine learning; hematological features; optimum feature set
Suggested Citation: Suggested Citation