• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

推断生物库规模基因组数据中的群体结构。

Inferring population structure in biobank-scale genomic data.

机构信息

Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.

Department of Computer Science, University of California, Los Angeles, Los Angeles, CA 90095, USA; Institute for Advanced Computer Studies, University of Maryland, College Park, College Park, MD 20742, USA.

出版信息

Am J Hum Genet. 2022 Apr 7;109(4):727-737. doi: 10.1016/j.ajhg.2022.02.015. Epub 2022 Mar 16.

DOI:10.1016/j.ajhg.2022.02.015
PMID:35298920
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9069078/
Abstract

Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.

摘要

从遗传变异数据推断人类群体的结构是群体和医学基因组研究中的一项关键任务。尽管已经提出了许多用于群体结构推断的方法,但目前的方法在运行包含数百万个体和遗传变异的生物库规模基因组数据集时是不切实际的。我们引入了 SCOPE,这是一种比现有方法快几个数量级的群体结构推断方法,同时达到了相当的准确性。SCOPE 可以在一天内对包含一百万个个体和变体的数据集以及包含 488363 个人和 569346 个变体的 UK Biobank 数据集进行群体结构推断。此外,SCOPE 可以利用来自先前研究的等位基因频率来提高群体结构估计的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/4f5798d8b3fe/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/bd8256611c3e/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/7a949d7d3a3a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/cee0661764bd/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/4f5798d8b3fe/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/bd8256611c3e/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/7a949d7d3a3a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/cee0661764bd/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dec/9069078/4f5798d8b3fe/gr4.jpg

相似文献

1
Inferring population structure in biobank-scale genomic data.推断生物库规模基因组数据中的群体结构。
Am J Hum Genet. 2022 Apr 7;109(4):727-737. doi: 10.1016/j.ajhg.2022.02.015. Epub 2022 Mar 16.
2
Rye: genetic ancestry inference at biobank scale.黑麦:生物库规模的遗传祖先推断。
Nucleic Acids Res. 2023 May 8;51(8):e44. doi: 10.1093/nar/gkad149.
3
Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets.无监督发现生物库规模数据集的祖先信息标记和遗传混合比例。
Am J Hum Genet. 2023 Feb 2;110(2):314-325. doi: 10.1016/j.ajhg.2022.12.008. Epub 2023 Jan 6.
4
Enabling efficient analysis of biobank-scale data with genotype representation graphs.利用基因型表示图实现生物样本库规模数据的高效分析。
Nat Comput Sci. 2025 Feb;5(2):112-124. doi: 10.1038/s43588-024-00739-9. Epub 2024 Dec 5.
5
Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。
BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.
6
Thousands of missing variants in the UK Biobank are recoverable by genome realignment.英国生物库中数以千计的缺失变异可通过基因组重-alignment 恢复。
Ann Hum Genet. 2020 May;84(3):214-220. doi: 10.1111/ahg.12383. Epub 2020 Mar 31.
7
LEI: A Novel Allele Frequency-Based Feature Selection Method for Multi-ancestry Admixed Populations.LEI:一种基于新型等位基因频率的多血统混合人群特征选择方法。
Sci Rep. 2019 Jul 31;9(1):11103. doi: 10.1038/s41598-019-47012-y.
8
Fast and accurate long-range phasing in a UK Biobank cohort.英国生物银行队列中的快速准确长程定相
Nat Genet. 2016 Jul;48(7):811-6. doi: 10.1038/ng.3571. Epub 2016 Jun 6.
9
Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits.生物银行规模下的祖先重组图推断实现了复杂性状的系谱分析。
Nat Genet. 2023 May;55(5):768-776. doi: 10.1038/s41588-023-01379-x. Epub 2023 May 1.
10
Mexican Biobank advances population and medical genomics of diverse ancestries.墨西哥生物银行推进了具有不同祖先的人群和医学基因组学研究。
Nature. 2023 Oct;622(7984):775-783. doi: 10.1038/s41586-023-06560-0. Epub 2023 Oct 11.

引用本文的文献

1
Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns.具有相同祖先的混合和单一大陆基因组片段具有不同的连锁不平衡模式。
Genome Biol. 2025 Jul 11;26(1):201. doi: 10.1186/s13059-025-03672-w.
2
Polygenic risk scores for prostate cancer: Comparative evaluations in UK and Australian cohorts.前列腺癌的多基因风险评分:英国和澳大利亚队列的比较评估。
HGG Adv. 2025 Jul 7;6(4):100477. doi: 10.1016/j.xhgg.2025.100477.
3
Genetic disease risks of under-represented founder populations in New York City.

本文引用的文献

1
Estimating FST and kinship for arbitrary population structures.估计任意群体结构的 FST 和亲缘关系。
PLoS Genet. 2021 Jan 19;17(1):e1009241. doi: 10.1371/journal.pgen.1009241. eCollection 2021 Jan.
2
Identity-by-descent detection across 487,409 British samples reveals fine scale population structure and ultra-rare variant associations.对 487409 个英国样本的亲缘关系检测揭示了精细的人口结构和超罕见变异关联。
Nat Commun. 2020 Nov 30;11(1):6130. doi: 10.1038/s41467-020-19588-x.
3
UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.
纽约市代表性不足的奠基人群体的遗传疾病风险。
PLoS Genet. 2025 Jun 24;21(6):e1011755. doi: 10.1371/journal.pgen.1011755. eCollection 2025 Jun.
4
Cross-Ancestry Associations of Spontaneous Coronary Artery Dissection Genetic Risk With Coronary Atherosclerosis and Migraine Headache.自发性冠状动脉夹层遗传风险与冠状动脉粥样硬化和偏头痛的跨种族关联
J Am Heart Assoc. 2025 May 20;14(10):e036525. doi: 10.1161/JAHA.124.036525. Epub 2025 May 13.
5
Large-scale admixture mapping in the improves the characterization of cross-population phenotypic differences.在……中的大规模混合映射改善了跨群体表型差异的特征描述。 (你提供的原文“in the ”后面缺少具体内容,所以翻译出来不太完整准确,可补充完整后再让我翻译。)
medRxiv. 2025 Apr 3:2025.04.02.25325115. doi: 10.1101/2025.04.02.25325115.
6
Fine-mapping in admixed populations using CARMA-X, with applications to Latin American studies.使用CARMA-X在混合人群中进行精细定位及其在拉丁美洲研究中的应用。
Am J Hum Genet. 2025 May 1;112(5):1215-1232. doi: 10.1016/j.ajhg.2025.02.020. Epub 2025 Mar 26.
7
Estimation of genetic admixture proportions via haplotypes.通过单倍型估计遗传混合比例。
Comput Struct Biotechnol J. 2024 Dec 6;23:4384-4395. doi: 10.1016/j.csbj.2024.11.043. eCollection 2024 Dec.
8
Inferring ancestry with the hierarchical soft clustering approach tangleGen.使用分层软聚类方法tangleGen推断血统。
Genome Res. 2024 Dec 23;34(12):2244-2255. doi: 10.1101/gr.279399.124.
9
Genetic disease risks of under-represented founder populations in New York City.纽约市代表性不足的奠基人群体的遗传疾病风险。
medRxiv. 2024 Sep 28:2024.09.27.24314513. doi: 10.1101/2024.09.27.24314513.
10
Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing.通过全基因组测序解码日本人群的三代祖先、古老的基因渗入和自然选择。
Sci Adv. 2024 Apr 19;10(16):eadi8419. doi: 10.1126/sciadv.adi8419. Epub 2024 Apr 17.
UMAP 揭示了大型基因组队列中的隐藏种群结构和表型异质性。
PLoS Genet. 2019 Nov 1;15(11):e1008432. doi: 10.1371/journal.pgen.1008432. eCollection 2019 Nov.
4
Inferring whole-genome histories in large population datasets.在大型人群数据集推断全基因组历史。
Nat Genet. 2019 Sep;51(9):1330-1338. doi: 10.1038/s41588-019-0483-y. Epub 2019 Sep 2.
5
FEAST: fast expectation-maximization for microbial source tracking.FEAST:用于微生物溯源的快速期望最大化算法。
Nat Methods. 2019 Jul;16(7):627-632. doi: 10.1038/s41592-019-0431-x. Epub 2019 Jun 10.
6
A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis.一种基于混合模型和主成分分析的群体结构似然无估计方法。
Genetics. 2019 Aug;212(4):1009-1029. doi: 10.1534/genetics.119.302159. Epub 2019 Apr 26.
7
The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库,具有深度表型和基因组数据。
Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.
8
FlashPCA2: principal component analysis of Biobank-scale genotype datasets.FlashPCA2:生物样本库规模基因型数据集的主成分分析
Bioinformatics. 2017 Sep 1;33(17):2776-2778. doi: 10.1093/bioinformatics/btx299.
9
Fast admixture analysis and population tree estimation for SNP and NGS data.快速混合分析和 SNP 及 NGS 数据的群体树估计。
Bioinformatics. 2017 Jul 15;33(14):2148-2155. doi: 10.1093/bioinformatics/btx098.
10
Scaling probabilistic models of genetic variation to millions of humans.将遗传变异的概率模型扩展到数百万人类。
Nat Genet. 2016 Dec;48(12):1587-1590. doi: 10.1038/ng.3710. Epub 2016 Nov 7.