Prediction of Tissue of Origin and Molecular Subtypes for Cancer of Unknown Primary Using Machine Learning
436 Pages Posted: 18 Feb 2020More...
It is estimated that approximately 5% of all metastatic tumors have no defined primary site despite adequate diagnostic workup and are therefore classified as cancers of unknown primary (CUP). CUP patients are denied site-specific therapy and have poor prognosis. The knowledge of a tumor’s primary site and molecular subtype can potentially play a critical role in the choice of treatment regimen and prognosis. We developed a deep learning method to identify the primary site using the transcriptional profiles of annotated primary tumors across 32 cancer types from The Cancer Genome Atlas project (TCGA). Further, given a putative tissue of origin, we have developed models to classify the molecular subtype of a sample for 11 primary cancer types. Our 1-D Inception convolutional neural network identifies the primary site with an overall top-1-accuracy of 97.20% in cross-validation and overall top-1-accuracy of 92.64% in independent external validation of metastatic tumors with known primaries. Gene expression data is ordered by gene chromosomal coordinates as input to the 1D CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model has been optimized through extensive hyperparameter tuning, including different max pooling layer and dropout settings. This method to identify the primary site and molecular subtype will provide better and therapeutic opportunities for CUP patients.
Funding Statement: Funding for the project was provided by Cancer Research UK and the British Columbia Cancer Agency Branch. This work was supported by the Leukemia Research Foundation New Investigator Grant, The Jackson Laboratory Cancer Center New Investigator Award, and the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133562. Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number P30CA034196.
Declaration of Interests: The authors declare no competing interests.
Ethics Approval Statement: Not required.
Keywords: Cancer; Classification; Machine Learning; Deep Learning; Cancer of Unknown Primary; Convolutional Neural Networks; TCGA; 1-D Inception Network
Suggested Citation: Suggested Citation