Latent Model-Based Clustering for Biological Discovery
24 Pages Posted: 5 Sep 2018 Sneak Peek Status: PublishedMore...
We present LOVE – a robust, highly scalable latent model-based clustering method for biological discovery. LOVE can be used across a range of datasets to generate both overlapping and non-overlapping clusters. In our formulation, a cluster comprises variables associated with the same latent factor, and is determined from an allocation matrix that indexes our latent model. We prove that the allocation matrix and corresponding clusters are uniquely defined. We apply LOVE to a gene expression dataset, and demonstrate that it detects biologically meaningful clusters. LOVE outperforms existing methods both in terms of the significance of the clusters, as well as correctly identifying overlaps corresponding to pleiotropic gene function. Next, we used LOVE on serological responses measured from HIV controllers and chronic progressors, and were able to accurately cluster these two distinct clinical phenotypes in a nonoverlapping fashion. For both datasets, the clusters generated by LOVE remain stable across a range of tuning parameters. Overall, our results demonstrate that LOVE can be broadly used across a wide range of large-scale datasets for novel biological discovery.
Suggested Citation: Suggested Citation