Suppr超能文献

黑麦:生物库规模的遗传祖先推断。

Rye: genetic ancestry inference at biobank scale.

机构信息

National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, MD, USA.

IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA, USA.

出版信息

Nucleic Acids Res. 2023 May 8;51(8):e44. doi: 10.1093/nar/gkad149.

Abstract

Biobank projects are generating genomic data for many thousands of individuals. Computational methods are needed to handle these massive data sets, including genetic ancestry (GA) inference tools. Current methods for GA inference do not scale to biobank-size genomic datasets. We present Rye-a new algorithm for GA inference at biobank scale. We compared the accuracy and runtime performance of Rye to the widely used RFMix, ADMIXTURE and iAdmix programs and applied it to a dataset of 488221 genome-wide variant samples from the UK Biobank. Rye infers GA based on principal component analysis of genomic variant samples from ancestral reference populations and query individuals. The algorithm's accuracy is powered by Metropolis-Hastings optimization and its speed is provided by non-negative least squares regression. Rye produces highly accurate GA estimates for three-way admixed populations-African, European and Native American-compared to RFMix and ADMIXTURE (${R}^2 = \ 0.998 - 1.00$), and shows 50× runtime improvement compared to ADMIXTURE on the UK Biobank dataset. Rye analysis of UK Biobank samples demonstrates how it can be used to infer GA at both continental and subcontinental levels. We discuss user consideration and options for the use of Rye; the program and its documentation are distributed on the GitHub repository: https://github.com/healthdisparities/rye.

摘要

生物库项目正在为数千人产生基因组数据。需要计算方法来处理这些大规模数据集,包括遗传祖先(GA)推断工具。当前的 GA 推断方法无法扩展到生物库大小的基因组数据集。我们提出了 Rye——一种用于生物库规模 GA 推断的新算法。我们比较了 Rye 与广泛使用的 RFMix、ADMIXTURE 和 iAdmix 程序的准确性和运行时性能,并将其应用于来自英国生物库的 488221 个全基因组变异样本数据集。Rye 根据来自祖先参考群体和查询个体的基因组变异样本的主成分分析来推断 GA。该算法的准确性由 Metropolis-Hastings 优化提供,其速度由非负最小二乘回归提供。与 RFMix 和 ADMIXTURE 相比,Rye 为三种混合人群(非洲人、欧洲人和美洲原住民)生成了高度准确的 GA 估计(${R}^2 = \ 0.998 - 1.00$),并且在英国生物库数据集上比 ADMIXTURE 快 50 倍。Rye 对英国生物库样本的分析展示了它如何用于推断大陆和次大陆级别的 GA。我们讨论了 Rye 的用户考虑因素和使用选项;该程序及其文档在 GitHub 存储库上分发:https://github.com/healthdisparities/rye。

相似文献

1
Rye: genetic ancestry inference at biobank scale.黑麦:生物库规模的遗传祖先推断。
Nucleic Acids Res. 2023 May 8;51(8):e44. doi: 10.1093/nar/gkad149.
2
Inferring population structure in biobank-scale genomic data.推断生物库规模基因组数据中的群体结构。
Am J Hum Genet. 2022 Apr 7;109(4):727-737. doi: 10.1016/j.ajhg.2022.02.015. Epub 2022 Mar 16.
8
Haplotype estimation for biobank-scale data sets.生物样本库规模数据集的单倍型估计
Nat Genet. 2016 Jul;48(7):817-20. doi: 10.1038/ng.3583. Epub 2016 Jun 6.

引用本文的文献

本文引用的文献

1
A unified genealogy of modern and ancient genomes.现代和古代基因组的统一族谱。
Science. 2022 Feb 25;375(6583):eabi8264. doi: 10.1126/science.abi8264.
2
Genetic ancestry and ethnic identity in Ecuador.厄瓜多尔的遗传血统与种族身份。
HGG Adv. 2021 Aug 20;2(4):100050. doi: 10.1016/j.xhgg.2021.100050. eCollection 2021 Oct 14.
10
What is ancestry?什么是血统?
PLoS Genet. 2020 Mar 9;16(3):e1008624. doi: 10.1371/journal.pgen.1008624. eCollection 2020 Mar.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验