puc-header

Latent Model-Based Clustering for Biological Discovery

24 Pages Posted: 5 Sep 2018 Sneak Peek Status: Review Complete

See all articles by Xin Bing

Xin Bing

Cornell University - Department of Statistical Science

Florentina Bunea

Cornell University - Department of Statistical Science

Martin Royer

Cornell University - Department of Statistical Science

Jishnu Das

Massachusetts Institute of Technology (MIT) - Department of Biological Engineering

More...

Abstract

We present LOVE – a robust, highly scalable latent model-based clustering method for biological discovery. LOVE can be used across a range of datasets to generate both overlapping and non-overlapping clusters. In our formulation, a cluster comprises variables associated with the same latent factor, and is determined from an allocation matrix that indexes our latent model. We prove that the allocation matrix and corresponding clusters are uniquely defined. We apply LOVE to a gene expression dataset, and demonstrate that it detects biologically meaningful clusters. LOVE outperforms existing methods both in terms of the significance of the clusters, as well as correctly identifying overlaps corresponding to pleiotropic gene function. Next, we used LOVE on serological responses measured from HIV controllers and chronic progressors, and were able to accurately cluster these two distinct clinical phenotypes in a nonoverlapping fashion. For both datasets, the clusters generated by LOVE remain stable across a range of tuning parameters. Overall, our results demonstrate that LOVE can be broadly used across a wide range of large-scale datasets for novel biological discovery.

Suggested Citation

Bing, Xin and Bunea, Florentina and Royer, Martin and Das, Jishnu, Latent Model-Based Clustering for Biological Discovery (September 5, 2018). Available at SSRN: https://ssrn.com/abstract=3244532 or http://dx.doi.org/10.2139/ssrn.3244532
This is a paper under consideration at Cell Press and has not been peer-reviewed.

Xin Bing (Contact Author)

Cornell University - Department of Statistical Science

Ithaca, NY 14853
United States

Florentina Bunea

Cornell University - Department of Statistical Science ( email )

Ithaca, NY 14853
United States

Martin Royer

Cornell University - Department of Statistical Science

Ithaca, NY 14853
United States

Jishnu Das

Massachusetts Institute of Technology (MIT) - Department of Biological Engineering ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States