Multi-Ancestry Transcriptome Prediction with Functionally Informed Variants in TOPMed MESA Improves Performance of Transcriptome-Wide Association Studies

Reliable reference transcriptome prediction models are key to accurate transcriptome-wide association study (TWAS). With the emergence of multi-ancestry genome-wide association study (GWAS), there is a need for reliable multi-ancestry transcriptome prediction models for downstream TWAS efforts. Here, we propose three methods leveraging functionally informed variants (FIVs), hereinafter referred to as FIV-based methods, that are more likely to influence gene expression to improve multi-ancestry TWAS. We trained transcriptome prediction models on 1,287 multi-ancestry participants from the Trans-Omics for Precision Medicine (TOPMed) program Multi-Ethnic Study of Atherosclerosis (MESA) with RNA-seq data from peripheral blood mononuclear cells (PBMCs). We validated models’ prediction accuracy on two external independent data sets, Geuvadis and the Jackson Heart Study (JHS). To test robustness of our FIV-based methods for multi-ancestry TWAS, we integrated developed transcriptome prediction models with three large-scale multi-ancestry GWASs from blood cell, lipid, and pulmonary function traits, respectively. Our FIV-based methods presented similar prediction accuracy but with a smaller and more accurate set of variants compared to the benchmark method, Elastic Net. Additionally, our FIV-based methods achieved significantly higher TWAS power for three GWAS traits (P<0.05 from Mann-Whitney U test) and produced higher TWAS accuracy by F1 score for all GWAS traits except two blood cell traits (with average improved accuracy of 24% over EN). However, no single proposed method outperformed in all GWAS traits. To further improve the TWAS performance, we propose an omnibus approach that aggregates TWAS summary statistics from our FIV-based methods. The omnibus approach yielded the highest number of Bonferroni-significant TWAS genes for all GWAS traits, and it further improved TWAS power and accuracy for blood cell traits. Additionally, the omnibus approach detected some trait-relevant important genes that the EN missed. We provided three examples in the manuscript for the demonstration of improvement from our omnibus approach. Our study demonstrates the value of including FIVs in multi-ancestry transcriptome prediction models for improving TWAS performance. Further, the improvement of TWAS performance depends on the GWAS trait’s relevance to the tissue or cell-type used to build transcriptome prediction models.

Keywords: Transcriptome prediction models, transcriptome-wide association study, functional annotation, multi-ancestry

Suggested Citation: Suggested Citation

Hu, Xiaowei and Araujo, Daniel S. and Khunsriraksakul, Chachrit and Wang, Lida and Sun, Quan and Wen, Jia and Zhou, Lingbo and Ekunwe, Lynette and Lange, Leslie A. and Lange, Ethan M. and Montgomery, Stephen B. and Reiner, Alexander P. and Aguet, Francois and Ardlie, Kristin G. and Lappalainen, Tuuli and Gignoux, Christopher R. and Burchard, Esteban and Taylor, Kent D. and Guo, Xiuqing and Rotter, Jerome I. and Rich, Stephen S. and Cornell, Elaine and Durda, Peter and Tracy, Russell P. and Liu, Yongmei and Johnson, W. Craig and Papanicolaou, George P. and Perera, Minoli A. and Cho, Michael H. and Liu, Dajiang J. and Raffield, Laura M. and Li, Yun and Group, TOPMed Multi-Omics Working and Wheeler, Heather E. and Im, Hae Kyung and Administrator, Sneak Peek and Manichaikul, Ani, Multi-Ancestry Transcriptome Prediction with Functionally Informed Variants in TOPMed MESA Improves Performance of Transcriptome-Wide Association Studies. Available at SSRN: https://ssrn.com/abstract=5194962 or http://dx.doi.org/10.2139/ssrn.5194962

This version of the paper has not been formally peer reviewed.