Yan Yan, Chen Rui, Kang Hakmook, Tan Yuting, Tiwari Anshul, Ma Siyuan, Wen Zhexing, Zhong Xue, Li Bingshan
Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America.
Vanderbilt Genetics Institute, Nashville, Tennessee, United States of America.
PLoS Comput Biol. 2025 Jul 21;21(7):e1013303. doi: 10.1371/journal.pcbi.1013303. eCollection 2025 Jul.
Identifying risk genes associated with complex traits remains challenging. Integrating gene expression data with Genome-Wide Association Study (GWAS) through Transcriptome-Wide Association Study (TWAS) methods has discovered candidate risk genes for various complex traits. Splicing, which explains a comparable heritability of complex traits as gene expression, is under-explored due to its multidimensionality. To leverage multiple splicing events in a gene and shared splicing across tissues, we develop Multi-tissue Splicing Gene (MTSG), which employs tensor decomposition and sparse Canonical Correlation Analysis (sCCA) to extract meaningful information from high-dimensional multiple splicing events across multiple tissues. We build MTSG models using GTEx data and apply them to GWAS summary statistics of Alzheimer's disease (AD) (111,326 cases and 677,663 controls) and schizophrenia (SCZ) (36,989 cases and 113,075 controls). We identify 174 and 497 significant splicing-mediated risk genes for AD and SCZ, respectively, at Bonferroni correction. For AD, our results demonstrate significant enrichment of AD related pathways and identify additional AD risk genes not detected in the single-tissue analysis, while preserving most top genes identified in the brain frontal cortex. Consistently, for SCZ, genes identified by our brain-wide MTSG model, built from a cluster of 13 brain tissues, exhibit stronger enrichment in SCZ-relevant genes and MTSG identifies unique SCZ risk genes compared to single-tissue models. These results showcase that our MTSG models capture distinctive splicing events across tissues, which might be overlooked when using single tissue alone. Our MTSG models can be applied to other complex traits to help identify splicing-mediated disease risk genes.
识别与复杂性状相关的风险基因仍然具有挑战性。通过转录组全关联研究(TWAS)方法将基因表达数据与全基因组关联研究(GWAS)相结合,已经发现了各种复杂性状的候选风险基因。剪接解释了与基因表达相当的复杂性状遗传力,但由于其多维性,尚未得到充分研究。为了利用基因中的多个剪接事件以及组织间共享的剪接,我们开发了多组织剪接基因(MTSG),它采用张量分解和稀疏典型相关分析(sCCA)从多个组织的高维多个剪接事件中提取有意义的信息。我们使用GTEx数据构建MTSG模型,并将其应用于阿尔茨海默病(AD)(111,326例病例和677,663例对照)和精神分裂症(SCZ)(36,989例病例和113,075例对照)的GWAS汇总统计数据。在Bonferroni校正下,我们分别为AD和SCZ鉴定出174个和497个显著的剪接介导风险基因。对于AD,我们的结果证明了AD相关通路的显著富集,并鉴定出在单组织分析中未检测到的其他AD风险基因,同时保留了在大脑额叶皮质中鉴定出的大多数顶级基因。同样,对于SCZ,由我们从13个脑组织簇构建的全脑MTSG模型鉴定出的基因在与SCZ相关的基因中表现出更强的富集,并且与单组织模型相比,MTSG鉴定出了独特的SCZ风险基因。这些结果表明,我们的MTSG模型捕获了跨组织的独特剪接事件,而单独使用单一组织时可能会忽略这些事件。我们的MTSG模型可以应用于其他复杂性状,以帮助识别剪接介导的疾病风险基因。