puc-header

Leveraging Protein Language Models to Identify Complex Trait Associations with Previously Inaccessible Classes of Functional Rare Variants

36 Pages Posted: 16 Dec 2024 Publication Status: Under Review

See all articles by Seon-Kyeong Jang

Seon-Kyeong Jang

University of California, Los Angeles (UCLA)

Zitian Wang

University of California, Los Angeles (UCLA)

Richard Border

University of California, Los Angeles (UCLA)

Angela Wei

University of California, Los Angeles (UCLA)

Ulzee An

University of California, Los Angeles (UCLA)

Sriram Sankararaman

University of California, Los Angeles (UCLA)

Vasilis Ntranos

University of California, San Francisco (UCSF)

Jonathan Flint

University of California, Los Angeles (UCLA)

Noah Zaitlen

University of California, Los Angeles (UCLA)

More...

Abstract

Protein language models (PLMs) provide variant effect predictions for previously underexplored classes of rare variants in exome sequencing studies. Here we present novel approaches for leveraging the unique properties of PLMs to test for associations between complex traits and rare variants. First, we develop an allelic series-based regression test for isoform-specific variants and discover ~22% more significant associations than standard tests. Furthermore, 17 gene-trait pairs showed significantly higher effect sizes in non-canonical than canonical transcript. Next, we search for Evolutionary Plausible Variants (EPVs), attributed positive scores by PLMs, which are at the opposite spectrum of annotated deleterious variants. We found EPVs compose a small percentage of missense variants (0.45%) and, consistent with differential selection pressures, their allele frequencies are significantly higher than non-EPV and synonymous (p<2.2e-16) variants. We additionally identify eight associations with EPVs, including novel protective associations with LDL and bone mineral density. Our results show how applying PLMs to exome data expands the universe of gene-trait association mapping and interpretation.

Keywords: protein language model, rare variant, gene-based test, exome sequence

Suggested Citation

Jang, Seon-Kyeong and Wang, Zitian and Border, Richard and Wei, Angela and An, Ulzee and Sankararaman, Sriram and Ntranos, Vasilis and Flint, Jonathan and Zaitlen, Noah and Administrator, Sneak Peek, Leveraging Protein Language Models to Identify Complex Trait Associations with Previously Inaccessible Classes of Functional Rare Variants. Available at SSRN: https://ssrn.com/abstract=5055097 or http://dx.doi.org/10.2139/ssrn.5055097
This version of the paper has not been formally peer reviewed.

Seon-Kyeong Jang (Contact Author)

University of California, Los Angeles (UCLA) ( email )

Zitian Wang

University of California, Los Angeles (UCLA) ( email )

Richard Border

University of California, Los Angeles (UCLA) ( email )

Angela Wei

University of California, Los Angeles (UCLA) ( email )

Ulzee An

University of California, Los Angeles (UCLA) ( email )

Sriram Sankararaman

University of California, Los Angeles (UCLA) ( email )

Vasilis Ntranos

University of California, San Francisco (UCSF) ( email )

Jonathan Flint

University of California, Los Angeles (UCLA) ( email )

Noah Zaitlen

University of California, Los Angeles (UCLA) ( email )

Click here to go to Cell.com

Paper statistics

Downloads
14
Abstract Views
178
PlumX Metrics