利用大型生物库中个体标记汇总统计数据进行计算效率高、精确且协变量调整的遗传主成分分析。

Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.

机构信息

Department of Mathematics, Statistics, and Computer Science, St. Olaf College, Northfield, MN 55057, USA,

出版信息

Pac Symp Biocomput. 2020;25:719-730.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6907735/

Abstract

The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data.

摘要

生物库的普及提供了前所未有的遗传和表型信息，可用于研究遗传与人类健康之间的关系。尽管这些数据集提供了很多机会，但它们也带来了与计算时间和成本、数据大小和传输以及隐私和安全相关的许多问题。发布这些生物库的汇总统计信息，并将其用于各种下游统计分析中，缓解了许多这些后勤问题。然而，在除了最简单的下游应用之外，如何使用汇总统计信息仍然存在重大问题。在这里，我们提出了一种利用基本汇总统计数据（单标记回归对单表型的估计）使用多元方法评估更复杂表型的新方法。具体来说，我们提出了一种仅使用生物库汇总统计数据进行主成分分析（PCA）的协变量调整方法。我们通过模拟验证了该方法的确切公式，并提供了在特定汇总统计信息不可用时的估计框架。我们将我们的方法应用于脂肪酸和基因组数据的真实数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7018/6907735/cff2723d3428/nihms-1061512-f0001.jpg

相似文献

Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.利用大型生物库中个体标记汇总统计数据进行计算效率高、精确且协变量调整的遗传主成分分析。

Pac Symp Biocomput. 2020;25:719-730.

Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.利用汇总统计数据对大型生物样本库中的复杂表型进行推断。

Pac Symp Biocomput. 2019;24:391-402.

Multimarker omnibus tests by leveraging individual marker summary statistics from large biobanks.通过利用大型生物样本库中单个标记物的汇总统计数据进行多标记综合测试。

Ann Hum Genet. 2023 May;87(3):125-136. doi: 10.1111/ahg.12495. Epub 2023 Jan 22.

Artificial intelligence powered statistical genetics in biobanks.人工智能驱动的生物库统计遗传学。

J Hum Genet. 2021 Jan;66(1):61-65. doi: 10.1038/s10038-020-0822-y. Epub 2020 Aug 11.

OCMA: Fast, Memory-Efficient Factorization of Prohibitively Large Relationship Matrices.OCMA：快速、高效地分解超大关系矩阵。

G3 (Bethesda). 2019 Jan 9;9(1):13-19. doi: 10.1534/g3.118.200908.

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics.全球生物库引擎：实现生物库汇总统计数据的基因型-表型浏览。

Bioinformatics. 2019 Jul 15;35(14):2495-2497. doi: 10.1093/bioinformatics/bty999.

Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.使用汇总统计量对最初分析的表型的乘法组合进行建模，并灵活选择协变量。

Front Genet. 2021 Oct 12;12:745901. doi: 10.3389/fgene.2021.745901. eCollection 2021.

PathGPS: discover shared genetic architecture using GWAS summary data.PathGPS：利用 GWAS 汇总数据发现共享遗传结构。

Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae060.

Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology.英国生物库中 2138 种表型的遗传关联成分突出了脂肪细胞生物学。

Nat Commun. 2019 Sep 6;10(1):4064. doi: 10.1038/s41467-019-11953-9.

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.一种快速且可扩展的大规模超高维稀疏回归框架及其在 UK Biobank 中的应用。

PLoS Genet. 2020 Oct 23;16(10):e1009141. doi: 10.1371/journal.pgen.1009141. eCollection 2020 Oct.

引用本文的文献

Genome-wide analysis of oxylipins and oxylipin profiles in a pediatric population.儿科人群中氧化脂质及氧化脂质谱的全基因组分析。

Front Nutr. 2023 Mar 28;10:1040993. doi: 10.3389/fnut.2023.1040993. eCollection 2023.

Front Genet. 2021 Oct 12;12:745901. doi: 10.3389/fgene.2021.745901. eCollection 2021.

Approximate conditional phenotype analysis based on genome wide association summary statistics.基于全基因组关联汇总统计数据的近似条件表型分析。

Sci Rep. 2021 Jan 28;11(1):2518. doi: 10.1038/s41598-021-82000-1.

本文引用的文献

Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.利用汇总统计数据对大型生物样本库中的复杂表型进行推断。

Pac Symp Biocomput. 2019;24:391-402.

PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy.基于主成分分析的多性状全基因组关联研究分析：探索基因多效性的强大模型

Animals (Basel). 2018 Dec 17;8(12):239. doi: 10.3390/ani8120239.

An atlas of genetic associations in UK Biobank.英国生物银行中的遗传关联图谱

Nat Genet. 2018 Nov;50(11):1593-1599. doi: 10.1038/s41588-018-0248-z. Epub 2018 Oct 22.

A genome-wide association study of red-blood cell fatty acids and ratios incorporating dietary covariates: Framingham Heart Study Offspring Cohort.全基因组关联研究纳入饮食协变量的红细胞脂肪酸及其比值：弗雷明汉心脏研究后代队列。

PLoS One. 2018 Apr 13;13(4):e0194882. doi: 10.1371/journal.pone.0194882. eCollection 2018.

Methods for meta-analysis of multiple traits using GWAS summary statistics.使用全基因组关联研究（GWAS）汇总统计量进行多性状荟萃分析的方法。

Genet Epidemiol. 2018 Mar;42(2):134-145. doi: 10.1002/gepi.22105. Epub 2017 Dec 10.

Genome-Wide Interaction Study of Omega-3 PUFAs and Other Fatty Acids on Inflammatory Biomarkers of Cardiovascular Health in the Framingham Heart Study.全基因组交互研究ω-3 多不饱和脂肪酸与其他脂肪酸对弗雷明汉心脏研究中心血管健康炎症生物标志物的影响。

Nutrients. 2017 Aug 18;9(8):900. doi: 10.3390/nu9080900.

Multiple phenotype association tests using summary statistics in genome-wide association studies.在全基因组关联研究中使用汇总统计量进行多表型关联测试。

Biometrics. 2018 Mar;74(1):165-175. doi: 10.1111/biom.12735. Epub 2017 Jun 26.

A principal component meta-analysis on multiple anthropometric traits identifies novel loci for body shape.多个人体测量特征的主成分荟萃分析确定了身体形态的新位点。

Nat Commun. 2016 Nov 23;7:13357. doi: 10.1038/ncomms13357.

Privacy and Security within Biobanking: The Role of Information Technology.生物样本库中的隐私与安全：信息技术的作用

J Law Med Ethics. 2016 Mar;44(1):156-60. doi: 10.1177/1073110516644206.

metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.metaCCA：基于全基因组关联研究汇总统计量，运用典型相关分析的多变量荟萃分析。

Bioinformatics. 2016 Jul 1;32(13):1981-9. doi: 10.1093/bioinformatics/btw052. Epub 2016 Feb 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用大型生物库中个体标记汇总统计数据进行计算效率高、精确且协变量调整的遗传主成分分析。

Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献