*K-Means and Cluster Models for Cancer Signatures

Biomolecular Detection and Quantification 13 (2017) 7-31

124 Pages Posted: 1 Feb 2017 Last revised: 5 Oct 2017

See all articles by Zura Kakushadze

Zura Kakushadze

Quantigic Solutions LLC; Free University of Tbilisi

Willie Yu

Duke-NUS Medical School - Centre for Computational Biology

Date Written: January 30, 2017

Abstract

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in http://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1,389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

Keywords: Clustering, K-Means, Nonnegative Matrix Factorization, Somatic Mutation, Cancer Signatures, Genome, Exome, DNA, eRank, Correlation, Covariance, Machine Learning, Sample, Matrix, Source Code, Quantitative Finance, Statistical Risk Model, Industry Classification, Bonds, Foreign Exchange, Alphas

JEL Classification: G00

Suggested Citation

Kakushadze, Zura and Yu, Willie, *K-Means and Cluster Models for Cancer Signatures (January 30, 2017). Biomolecular Detection and Quantification 13 (2017) 7-31. Available at SSRN: https://ssrn.com/abstract=2908286 or http://dx.doi.org/10.2139/ssrn.2908286

Zura Kakushadze (Contact Author)

Quantigic Solutions LLC ( email )

1127 High Ridge Road #135
Stamford, CT 06905
United States
6462210440 (Phone)
6467923264 (Fax)

HOME PAGE: http://www.linkedin.com/in/zurakakushadze

Free University of Tbilisi ( email )

Business School and School of Physics
240, David Agmashenebeli Alley
Tbilisi, 0159
Georgia

Willie Yu

Duke-NUS Medical School - Centre for Computational Biology ( email )

8 College Road
Singapore, 169857
Singapore

Register to save articles to
your library

Register

Paper statistics

Downloads
1,820
Abstract Views
6,930
rank
8,489
PlumX Metrics