转录组关联研究中预测准确性的调查及样本量、祖源和组织的影响。

Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies.

机构信息

Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK.

Division of Musculoskeletal and Dermatological Sciences, University of Manchester, Manchester, UK.

出版信息

Genet Epidemiol. 2020 Jul;44(5):425-441. doi: 10.1002/gepi.22290. Epub 2020 Mar 19.

DOI:10.1002/gepi.22290

PMID:32190932

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8641384/

Abstract

In transcriptome-wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods-LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests-by performing cross-validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.

摘要

在转录组全基因组关联研究（TWAS）中，使用基因型数据预测基因表达值，并测试其与表型的关联。该方法检测关联的能力至少部分依赖于预测的准确性。在这里，我们通过使用 Geuvadis 项目的数据进行交叉验证，比较了六种不同方法（LASSO、岭回归、弹性网络、最佳线性无偏预测、贝叶斯稀疏线性混合模型和随机森林）的预测准确性。我们还研究了预测准确性（a）在不同样本大小下的表现，（b）在预测模型训练和测试人群的祖源不同时的表现，以及（c）在用于训练模型的组织与要预测的组织不同时的表现。我们发现，对于大多数基因，表达不能被准确预测，但一般来说，稀疏统计模型在预测方面往往优于多基因模型。当模型训练集的大小减小时，平均预测准确性会降低，当跨祖源预测时，预测准确性会略有降低，当跨组织预测时，预测准确性也会略有降低。我们得出结论，使用稀疏统计模型和在多个种族和组织中开发大型参考面板将导致更好的基因表达预测，从而可能提高 TWAS 的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2412/8641384/b7a36d6e0f65/GEPI-44-425-g008.jpg

相似文献

Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies.转录组关联研究中预测准确性的调查及样本量、祖源和组织的影响。

Genet Epidemiol. 2020 Jul;44(5):425-441. doi: 10.1002/gepi.22290. Epub 2020 Mar 19.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?基于汇总数据的方法在不同遗传结构下识别表达性状关联的能力有多强？

Pac Symp Biocomput. 2018;23:228-239.

Meta-imputation of transcriptome from genotypes across multiple datasets by leveraging publicly available summary-level data.利用公开的汇总水平数据，通过跨多个数据集的基因型进行转录组元推断。

PLoS Genet. 2022 Jan 31;18(1):e1009571. doi: 10.1371/journal.pgen.1009571. eCollection 2022 Jan.

Transferability of Single- and Cross-Tissue Transcriptome Imputation Models Across Ancestry Groups.单组织和跨组织转录组插补模型在不同祖先群体间的可转移性

Genet Epidemiol. 2025 Jan;49(1):e22611. doi: 10.1002/gepi.22611.

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improves the power of transcriptome-wide association studies.利用稀疏典型相关分析和综合检验从多个组织中获取表达信息，可提高全转录组关联研究的效能。

PLoS Genet. 2021 Apr 8;17(4):e1008973. doi: 10.1371/journal.pgen.1008973. eCollection 2021 Apr.

Statistical power of transcriptome-wide association studies.转录组关联研究的统计功效。

Genet Epidemiol. 2022 Dec;46(8):572-588. doi: 10.1002/gepi.22491. Epub 2022 Jun 29.

Power analysis of transcriptome-wide association study: Implications for practical protocol choice.全转录组关联研究的功效分析：对实际方案选择的启示。

PLoS Genet. 2021 Feb 26;17(2):e1009405. doi: 10.1371/journal.pgen.1009405. eCollection 2021 Feb.

TIPS: a novel pathway-guided joint model for transcriptome-wide association studies.TIPS：一种新型通路导向的转录组全基因组关联研究联合模型。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae587.

Multi-tissue transcriptome-wide association studies.多组织转录组全基因组关联研究。

Genet Epidemiol. 2021 Apr;45(3):324-337. doi: 10.1002/gepi.22374. Epub 2020 Dec 28.

A framework for transcriptome-wide association studies in breast cancer in diverse study populations.用于在不同研究人群的乳腺癌中转录组全基因组关联研究的框架。

Genome Biol. 2020 Feb 20;21(1):42. doi: 10.1186/s13059-020-1942-6.

引用本文的文献

Large-scale multi-omics analyses in Hispanic/Latino populations identify genes for cardiometabolic traits.对西班牙裔/拉丁裔人群的大规模多组学分析确定了心血管代谢特征的相关基因。

Nat Commun. 2025 Apr 11;16(1):3438. doi: 10.1038/s41467-025-58574-z.

Transferability of Single- and Cross-Tissue Transcriptome Imputation Models Across Ancestry Groups.单组织和跨组织转录组插补模型在不同祖先群体间的可转移性

Genet Epidemiol. 2025 Jan;49(1):e22611. doi: 10.1002/gepi.22611.

Integrating Gene Expression Data into Single-Step Method (ssBLUP) Improves Genomic Prediction Accuracy for Complex Traits of Duroc × Erhualian F Pig Population.将基因表达数据整合到单步方法（ssBLUP）中可提高杜洛克×二花脸F猪群体复杂性状的基因组预测准确性。

Curr Issues Mol Biol. 2024 Dec 3;46(12):13713-13724. doi: 10.3390/cimb46120819.

Co-expression-wide association studies link genetically regulated interactions with complex traits.共表达全基因组关联研究将基因调控的相互作用与复杂性状联系起来。

medRxiv. 2024 Dec 13:2024.10.02.24314813. doi: 10.1101/2024.10.02.24314813.

A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation.一种用于识别具有基因调控上下文特异性模式的基因的自举模型比较测试。

Ann Appl Stat. 2024 Sep;18(3):1840-1857. doi: 10.1214/23-aoas1859. Epub 2024 Aug 5.

Multivariate adaptive shrinkage improves cross-population transcriptome prediction and association studies in underrepresented populations.多变量自适应收缩可提高代表性不足人群中转录组预测和关联研究的跨人群效果。

HGG Adv. 2023 Jul 1;4(4):100216. doi: 10.1016/j.xhgg.2023.100216. eCollection 2023 Oct 12.

Transcriptome-wide association studies: recent advances in methods, applications and available databases.转录组关联研究：方法、应用和现有数据库的最新进展。

Commun Biol. 2023 Sep 1;6(1):899. doi: 10.1038/s42003-023-05279-y.

Using GWAS summary data to impute traits for genotyped individuals.利用 GWAS 汇总数据对已基因型个体进行表型推断。

HGG Adv. 2023 Apr 12;4(3):100197. doi: 10.1016/j.xhgg.2023.100197. eCollection 2023 Jul 13.

A BOOTSTRAP MODEL COMPARISON TEST FOR IDENTIFYING GENES WITH CONTEXT-SPECIFIC PATTERNS OF GENETIC REGULATION.一种用于识别具有基因调控上下文特异性模式基因的自举模型比较测试。

bioRxiv. 2023 Oct 22:2023.03.06.531446. doi: 10.1101/2023.03.06.531446.

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations.多变量自适应收缩法改善了在代表性不足人群中进行全转录组关联研究时的跨人群转录组预测。

bioRxiv. 2023 May 20:2023.02.09.527747. doi: 10.1101/2023.02.09.527747.

本文引用的文献

Trans Effects on Gene Expression Can Drive Omnigenic Inheritance.转录效应对基因表达的影响可驱动全基因组遗传。

Cell. 2019 May 2;177(4):1022-1034.e6. doi: 10.1016/j.cell.2019.04.014.

Genetic architecture of gene expression traits across diverse populations.跨多种人群的基因表达性状的遗传结构。

PLoS Genet. 2018 Aug 10;14(8):e1007586. doi: 10.1371/journal.pgen.1007586. eCollection 2018 Aug.

Comparison of methods for transcriptome imputation through application to two common complex diseases.通过应用于两种常见的复杂疾病比较转录组推断方法。

Eur J Hum Genet. 2018 Nov;26(11):1658-1667. doi: 10.1038/s41431-018-0176-5. Epub 2018 Jul 5.

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.从 GWAS 汇总统计数据推断组织特异性基因表达变异的表型后果。

Nat Commun. 2018 May 8;9(1):1825. doi: 10.1038/s41467-018-03621-1.

Functional mapping and annotation of genetic associations with FUMA.使用 FUMA 进行遗传关联的功能映射和注释。

Nat Commun. 2017 Nov 28;8(1):1826. doi: 10.1038/s41467-017-01261-5.

Prediction of gene expression with cis-SNPs using mixed models and regularization methods.使用混合模型和正则化方法通过顺式单核苷酸多态性预测基因表达

BMC Genomics. 2017 May 11;18(1):368. doi: 10.1186/s12864-017-3759-6.

Large-Scale trans-eQTLs Affect Hundreds of Transcripts and Mediate Patterns of Transcriptional Co-regulation.大规模反式表达数量性状基因座影响数百个转录本并介导转录共调控模式。

Am J Hum Genet. 2017 Apr 6;100(4):581-591. doi: 10.1016/j.ajhg.2017.02.004. Epub 2017 Mar 9.

Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits.整合基因表达与汇总关联统计数据以识别与30种复杂性状相关的基因。

Am J Hum Genet. 2017 Mar 2;100(3):473-487. doi: 10.1016/j.ajhg.2017.01.031. Epub 2017 Feb 23.

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index.利用常见基因变异预测数量性状：在体重指数中的应用

Genomics Inform. 2016 Dec;14(4):149-159. doi: 10.5808/GI.2016.14.4.149. Epub 2016 Dec 30.

Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes.利用全基因组关联研究进行2型糖尿病风险预测

Genomics Inform. 2016 Dec;14(4):138-148. doi: 10.5808/GI.2016.14.4.138. Epub 2016 Dec 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

转录组关联研究中预测准确性的调查及样本量、祖源和组织的影响。

Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献