Suppr超能文献

利用公开的汇总水平数据,通过跨多个数据集的基因型进行转录组元推断。

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data.

机构信息

Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.

出版信息

PLoS Genet. 2022 Jan 31;18(1):e1009571. doi: 10.1371/journal.pgen.1009571. eCollection 2022 Jan.

Abstract

Transcriptome wide association studies (TWAS) can be used as a powerful method to identify and interpret the underlying biological mechanisms behind GWAS by mapping gene expression levels with phenotypes. In TWAS, gene expression is often imputed from individual-level genotypes of regulatory variants identified from external resources, such as Genotype-Tissue Expression (GTEx) Project. In this setting, a straightforward approach to impute expression levels of a specific tissue is to use the model trained from the same tissue type. When multiple tissues are available for the same subjects, it has been demonstrated that training imputation models from multiple tissue types improves the accuracy because of shared eQTLs between the tissues and increase in effective sample size. However, existing joint-tissue methods require access of genotype and expression data across all tissues. Moreover, they cannot leverage the abundance of various expression datasets across various tissues for non-overlapping individuals. Here, we explore the optimal way to combine imputed levels across training models from multiple tissues and datasets in a flexible manner using summary-level data. Our proposed method (SWAM) combines arbitrary number of transcriptome imputation models to linearly optimize the imputation accuracy given a target tissue. By integrating models across tissues and/or individuals, SWAM can improve the accuracy of transcriptome imputation or to improve power to TWAS while only requiring individual-level data from a single reference cohort. To evaluate the accuracy of SWAM, we combined 49 tissue-specific gene expression imputation models from the GTEx Project as well as from a large eQTL study of Depression Susceptibility Genes and Networks (DGN) Project and tested imputation accuracy in GEUVADIS lymphoblastoid cell lines samples. We also extend our meta-imputation method to meta-TWAS to leverage multiple tissues in TWAS analysis with summary-level statistics. Our results capitalize on the importance of integrating multiple tissues to unravel regulatory impacts of genetic variants on complex traits.

摘要

转录组全基因组关联研究(TWAS)可以通过将基因表达水平与表型相关联,作为一种强大的方法来识别和解释 GWAS 背后的潜在生物学机制。在 TWAS 中,基因表达通常是通过从外部资源(如基因型-组织表达(GTEx)项目)中识别的调控变体的个体水平基因型来推断的。在这种情况下,推断特定组织表达水平的一种直接方法是使用来自同一组织类型的模型进行训练。当同一个个体有多个组织可用时,已经证明从多个组织类型训练推断模型可以提高准确性,因为组织之间存在共享的 eQTL,并且有效样本量增加。然而,现有的联合组织方法需要获得所有组织的基因型和表达数据。此外,它们不能利用各种组织中不同个体的大量表达数据集来进行非重叠个体的推断。在这里,我们探索了一种灵活的方法,通过汇总数据以最优的方式结合来自多个组织和数据集的训练模型的推断水平。我们提出的方法(SWAM)结合了任意数量的转录组推断模型,以在线性方式优化给定目标组织的推断准确性。通过整合跨组织和/或个体的模型,SWAM 可以提高转录组推断的准确性,或者在仅需要单个参考队列的个体水平数据的情况下,提高 TWAS 的功效。为了评估 SWAM 的准确性,我们结合了 GTEx 项目以及抑郁易感性基因和网络(DGN)项目的大型 eQTL 研究中 49 个组织特异性基因表达推断模型,并在 GEUVADIS 淋巴母细胞系样本中测试了推断准确性。我们还将我们的元推断方法扩展到元 TWAS 中,以利用汇总统计信息在 TWAS 分析中结合多个组织。我们的结果利用了整合多个组织的重要性,以揭示遗传变异对复杂性状的调节影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c47/8830793/7e74d43a0d06/pgen.1009571.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验