Suppr超能文献

利用大型生物库中个体标记汇总统计数据进行计算效率高、精确且协变量调整的遗传主成分分析。

Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.

机构信息

Department of Mathematics, Statistics, and Computer Science, St. Olaf College, Northfield, MN 55057, USA,

出版信息

Pac Symp Biocomput. 2020;25:719-730.

Abstract

The popularization of biobanks provides an unprecedented amount of genetic and phenotypic information that can be used to research the relationship between genetics and human health. Despite the opportunities these datasets provide, they also pose many problems associated with computational time and costs, data size and transfer, and privacy and security. The publishing of summary statistics from these biobanks, and the use of them in a variety of downstream statistical analyses, alleviates many of these logistical problems. However, major questions remain about how to use summary statistics in all but the simplest downstream applications. Here, we present a novel approach to utilize basic summary statistics (estimates from single marker regressions on single phenotypes) to evaluate more complex phenotypes using multivariate methods. In particular, we present a covariate-adjusted method for conducting principal component analysis (PCA) utilizing only biobank summary statistics. We validate exact formulas for this method, as well as provide a framework of estimation when specific summary statistics are not available, through simulation. We apply our method to a real data set of fatty acid and genomic data.

摘要

生物库的普及提供了前所未有的遗传和表型信息,可用于研究遗传与人类健康之间的关系。尽管这些数据集提供了很多机会,但它们也带来了与计算时间和成本、数据大小和传输以及隐私和安全相关的许多问题。发布这些生物库的汇总统计信息,并将其用于各种下游统计分析中,缓解了许多这些后勤问题。然而,在除了最简单的下游应用之外,如何使用汇总统计信息仍然存在重大问题。在这里,我们提出了一种利用基本汇总统计数据(单标记回归对单表型的估计)使用多元方法评估更复杂表型的新方法。具体来说,我们提出了一种仅使用生物库汇总统计数据进行主成分分析(PCA)的协变量调整方法。我们通过模拟验证了该方法的确切公式,并提供了在特定汇总统计信息不可用时的估计框架。我们将我们的方法应用于脂肪酸和基因组数据的真实数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7018/6907735/cff2723d3428/nihms-1061512-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验