• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从少数微卫星到数百万个 SNPs 的基因型数据中快速准确地推断人群混合。

Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs.

机构信息

Institute of Zoology, Zoological Society of London, London, NW1 4RY, UK.

出版信息

Heredity (Edinb). 2022 Aug;129(2):79-92. doi: 10.1038/s41437-022-00535-z. Epub 2022 May 4.

DOI:10.1038/s41437-022-00535-z
PMID:35508539
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9338324/
Abstract

Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.

摘要

采用基于模型(似然和贝叶斯)和非模型(主成分分析和 K 均值聚类)的方法,利用标记基因型数据识别群体并将个体分配到已识别的群体中。基于模型的方法更受欢迎,因为它们基于具有生物学意义参数的群体遗传学概率模型,因此产生的结果易于解释和应用。此外,它们通常比非基于模型的方法产生更准确的结构推断。然而,目前的基于模型的方法要么计算量大,因此仅适用于小问题,要么使用简化的混合模型,在不平衡采样等困难情况下可能会产生不准确的结果。在这项研究中,我提出了新的基于似然的方法,用于使用来自少数多等位基因微卫星的基因型数据快速准确地推断群体混合。该方法首先通过使用混合模型和模拟退火算法对粗粒度的群体结构进行聚类分析,然后使用聚类结果作为期望最大化算法的起点对细粒度的群体结构进行混合分析。对模拟和真实数据的广泛分析表明,新方法在准确性和运行速度方面都优于现有方法。它们可以分析只有少数多等位基因微卫星的小数据集,但也可以并行处理包含数百万个标记和数百万个个体的数 TB 数据。在许多和/或分化程度低的群体、不平衡或非常小的个体样本等困难情况下,新方法比其他方法准确得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/f4eb551ad191/41437_2022_535_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/89833f59bb8e/41437_2022_535_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/e3e1103adf4a/41437_2022_535_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/360ca40530bb/41437_2022_535_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/9649dafe8171/41437_2022_535_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/f4eb551ad191/41437_2022_535_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/89833f59bb8e/41437_2022_535_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/e3e1103adf4a/41437_2022_535_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/360ca40530bb/41437_2022_535_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/9649dafe8171/41437_2022_535_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af74/9338324/f4eb551ad191/41437_2022_535_Fig5_HTML.jpg

相似文献

1
Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs.从少数微卫星到数百万个 SNPs 的基因型数据中快速准确地推断人群混合。
Heredity (Edinb). 2022 Aug;129(2):79-92. doi: 10.1038/s41437-022-00535-z. Epub 2022 May 4.
2
De novo inference of stratification and local admixture in sequencing studies.从头推断测序研究中的分层和局部混合。
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S17. doi: 10.1186/1471-2105-14-S5-S17. Epub 2013 Apr 10.
3
A fast least-squares algorithm for population inference.一种快速的用于群体推断的最小二乘法。
BMC Bioinformatics. 2013 Jan 23;14:28. doi: 10.1186/1471-2105-14-28.
4
Comparison of SNPs and microsatellites for assessing the genetic structure of chicken populations.比较 SNPs 和微卫星用于评估鸡群体的遗传结构。
Anim Genet. 2012 Aug;43(4):419-28. doi: 10.1111/j.1365-2052.2011.02284.x. Epub 2011 Nov 8.
5
STRUCTURE is more robust than other clustering methods in simulated mixed-ploidy populations.结构比其他聚类方法在模拟的混倍体群体中更稳健。
Heredity (Edinb). 2019 Oct;123(4):429-441. doi: 10.1038/s41437-019-0247-6. Epub 2019 Jul 8.
6
Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计
Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.
7
Estimation of individual admixture: analytical and study design considerations.个体混合比例的估计:分析与研究设计考量
Genet Epidemiol. 2005 May;28(4):289-301. doi: 10.1002/gepi.20064.
8
Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.使用Illumina ForenSeq试剂盒对新加坡人群样本进行血统预测。
Forensic Sci Int Genet. 2017 Nov;31:171-179. doi: 10.1016/j.fsigen.2017.08.013. Epub 2017 Aug 15.
9
PCA-based population structure inference with generic clustering algorithms.基于主成分分析的群体结构推断与通用聚类算法
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S73. doi: 10.1186/1471-2105-10-S1-S73.
10
Comparison of single-nucleotide polymorphisms and microsatellites in inference of population structure.比较单核苷酸多态性和微卫星在推断种群结构中的作用。
BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S26. doi: 10.1186/1471-2156-6-S1-S26.

引用本文的文献

1
Without the locals' aid: no evidence for a role of admixture in the colonisation success of Italian wall lizards.没有当地蜥蜴的帮助:没有证据表明混合在意大利壁蜥定殖成功中发挥作用。
Oecologia. 2025 Jul 8;207(7):125. doi: 10.1007/s00442-025-05769-2.
2
Phylogenomic Analysis of Wide-Ranging Least Shrews Refines Conservation Priorities and Supports a Paradigm for Evolution of Biota Spanning Eastern North America and Mesoamerica.广泛分布的伶鼩鼱的系统基因组分析优化了保护重点,并支持了一个跨越北美东部和中美洲生物群进化的范例。
Ecol Evol. 2025 May 12;15(5):e71263. doi: 10.1002/ece3.71263. eCollection 2025 May.
3
MHC Diversity Across Time and Space.

本文引用的文献

1
TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes.TeraPCA:一个快速且可扩展的软件包,用于研究万亿级基因型中的遗传变异。
Bioinformatics. 2019 Oct 1;35(19):3679-3683. doi: 10.1093/bioinformatics/btz157.
2
Scaling probabilistic models of genetic variation to millions of humans.将遗传变异的概率模型扩展到数百万人类。
Nat Genet. 2016 Dec;48(12):1587-1590. doi: 10.1038/ng.3710. Epub 2016 Nov 7.
3
Genetic Variability and Structuring of Arctic Charr (Salvelinus alpinus) Populations in Northern Fennoscandia.
MHC在时间和空间上的多样性。
Ecol Evol. 2025 Apr 28;15(4):e71371. doi: 10.1002/ece3.71371. eCollection 2025 Apr.
4
Repeated Mitochondrial Capture With Limited Genomic Introgression in a Lizard Group.蜥蜴群体中有限基因组渐渗下的重复线粒体捕获
Mol Ecol. 2025 May;34(10):e17766. doi: 10.1111/mec.17766. Epub 2025 Apr 16.
5
Deforestation-induced Hybridization in Philippine Frogs Creates a Distinct Phenotype With an Inviable Genotype.菲律宾青蛙因森林砍伐导致的杂交产生了具有不可行基因型的独特表型。
Heredity (Edinb). 2025 Apr;134(3-4):200-208. doi: 10.1038/s41437-025-00748-y. Epub 2025 Feb 16.
6
Climate-Associated Genetic Variation and Projected Genetic Offsets for D. Don Under Future Climate Scenarios.未来气候情景下与气候相关的遗传变异及多花黄精的预测遗传偏移
Evol Appl. 2025 Feb 6;18(2):e70077. doi: 10.1111/eva.70077. eCollection 2025 Feb.
7
Inferring ancestry with the hierarchical soft clustering approach tangleGen.使用分层软聚类方法tangleGen推断血统。
Genome Res. 2024 Dec 23;34(12):2244-2255. doi: 10.1101/gr.279399.124.
8
MSXFGP: combining improved sparrow search algorithm with XGBoost for enhanced genomic prediction.MSXFGP:结合改进的麻雀搜索算法和 XGBoost 以增强基因组预测。
BMC Bioinformatics. 2023 Oct 11;24(1):384. doi: 10.1186/s12859-023-05514-7.
斯堪的纳维亚半岛北部北极红点鲑(Salvelinus alpinus)种群的遗传变异性与结构
PLoS One. 2015 Oct 15;10(10):e0140344. doi: 10.1371/journal.pone.0140344. eCollection 2015.
4
The fine-scale genetic structure of the British population.英国人群的精细尺度遗传结构。
Nature. 2015 Mar 19;519(7543):309-314. doi: 10.1038/nature14230.
5
The genetic ancestry of African Americans, Latinos, and European Americans across the United States.美国非裔美国人、拉丁裔和欧洲裔美国人的遗传祖先。
Am J Hum Genet. 2015 Jan 8;96(1):37-53. doi: 10.1016/j.ajhg.2014.11.010. Epub 2014 Dec 18.
6
Ancient human genomes suggest three ancestral populations for present-day Europeans.古代人类基因组表明当今欧洲人有三个祖先群体。
Nature. 2014 Sep 18;513(7518):409-13. doi: 10.1038/nature13673.
7
fastSTRUCTURE: variational inference of population structure in large SNP data sets.fastSTRUCTURE:大型单核苷酸多态性(SNP)数据集中群体结构的变分推断
Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.
8
Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。
Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.
9
An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。
Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
10
Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.主成分判别分析:一种用于分析遗传结构群体的新方法。
BMC Genet. 2010 Oct 15;11:94. doi: 10.1186/1471-2156-11-94.