Suppr
超能文献

利用基因数据进行人群识别。

Population identification using genetic data.

机构信息

Heilbronn Institute for Mathematical Research, School of Mathematics, University of Bristol, Bristol BS8 1TW, UK.

出版信息

Annu Rev Genomics Hum Genet. 2012;13:337-61. doi: 10.1146/annurev-genom-082410-101510. Epub 2012 Jun 11.

DOI:10.1146/annurev-genom-082410-101510

PMID:22703172

Abstract

A large number of algorithms have been developed to classify individuals into discrete populations using genetic data. Recent results show that the information used by both model-based clustering methods and principal components analysis can be summarized by a matrix of pairwise similarity measures between individuals. Similarity matrices have been constructed in a number of ways, usually treating markers as independent but differing in the weighting given to polymorphisms of different frequencies. Additionally, methods are now being developed that take linkage into account. We review several such matrices and evaluate their information content. A two-stage approach for population identification is to first construct a similarity matrix and then perform clustering. We review a range of common clustering algorithms and evaluate their performance through a simulation study. The clustering step can be performed either on the matrix or by first using a dimension-reduction technique; we find that the latter approach substantially improves the performance of most algorithms. Based on these results, we describe the population structure signal contained in each similarity matrix and find that accounting for linkage leads to significant improvements for sequence data. We also perform a comparison on real data, where we find that population genetics models outperform generic clustering approaches, particularly with regard to robustness for features such as relatedness between individuals.

摘要

已经开发出大量算法来使用遗传数据将个体分类到离散群体中。最近的结果表明，基于模型的聚类方法和主成分分析所使用的信息可以用个体之间的成对相似性度量矩阵来总结。相似性矩阵已经以多种方式构建，通常将标记视为独立的，但在赋予不同频率多态性的权重方面有所不同。此外，现在正在开发考虑连锁的方法。我们回顾了几种这样的矩阵，并评估了它们的信息含量。一种用于群体识别的两阶段方法是首先构建相似性矩阵，然后进行聚类。我们回顾了一系列常见的聚类算法，并通过模拟研究评估它们的性能。聚类步骤可以在矩阵上执行，也可以首先使用降维技术执行；我们发现后者方法大大提高了大多数算法的性能。基于这些结果，我们描述了每个相似性矩阵中包含的群体结构信号，并发现连锁的考虑会显著提高序列数据的性能。我们还在真实数据上进行了比较，发现群体遗传学模型优于通用聚类方法，尤其是在个体之间的亲缘关系等特征的稳健性方面。

相似文献

Population identification using genetic data.

Annu Rev Genomics Hum Genet. 2012;13:337-61. doi: 10.1146/annurev-genom-082410-101510. Epub 2012 Jun 11.

Population model-based inter-diplotype similarity measure for accurate diplotype clustering.

J Comput Biol. 2012 Jan;19(1):55-67. doi: 10.1089/cmb.2010.0227. Epub 2011 Dec 9.

The effect of close relatives on unsupervised Bayesian clustering algorithms in population genetic structure analysis.

Mol Ecol Resour. 2012 Sep;12(5):873-84. doi: 10.1111/j.1755-0998.2012.03156.x. Epub 2012 May 28.

Comparison of algorithms to infer genetic population structure from unlinked molecular markers.

Stat Appl Genet Mol Biol. 2014 Aug;13(4):391-402. doi: 10.1515/sagmb-2013-0006.

AMOVA-based clustering of population genetic data.

J Hered. 2012 Sep-Oct;103(5):744-50. doi: 10.1093/jhered/ess047. Epub 2012 Aug 15.

A new method to estimate relatedness from molecular markers.

Mol Ecol. 2006 May;15(6):1657-67. doi: 10.1111/j.1365-294X.2006.02873.x.

Inference of population structure using genetic markers and a Bayesian model averaging approach for clustering.

J Comput Biol. 2008 Mar;15(2):207-20. doi: 10.1089/cmb.2007.0051.

Joint analysis of demography and selection in population genetics: where do we stand and where could we go?

Mol Ecol. 2012 Jan;21(1):28-44. doi: 10.1111/j.1365-294X.2011.05308.x. Epub 2011 Oct 14.

Fine mapping of disease genes using tagging SNPs.

Ann Hum Genet. 2007 Nov;71(Pt 6):815-27. doi: 10.1111/j.1469-1809.2007.00379.x. Epub 2007 Jun 22.

Comparison of SNPs and microsatellites for assessing the genetic structure of chicken populations.

Anim Genet. 2012 Aug;43(4):419-28. doi: 10.1111/j.1365-2052.2011.02284.x. Epub 2011 Nov 8.

引用本文的文献

Human genetic structure in Northwest France provides new insights into West European historical demography.

Nat Commun. 2024 Aug 7;15(1):6710. doi: 10.1038/s41467-024-51087-1.

Genetic diversity among maize (Zea mays L.) inbred lines adapted to Japanese climates.

PLoS One. 2024 Jan 25;19(1):e0297549. doi: 10.1371/journal.pone.0297549. eCollection 2024.

Scrutinising an inscrutable bark-nesting ant: Exploring cryptic diversity in the (Hymenoptera: Formicidae) complex using DNA barcodes, genome-wide MIG-seq and geometric morphometrics.

PeerJ. 2023 Nov 16;11:e16416. doi: 10.7717/peerj.16416. eCollection 2023.

Japan considered from the hypothesis of farmer/language spread.

Evol Hum Sci. 2020 May 5;2:e13. doi: 10.1017/ehs.2020.7. eCollection 2020.

Hybrid autoencoder with orthogonal latent space for robust population structure inference.

Sci Rep. 2023 Feb 14;13(1):2612. doi: 10.1038/s41598-023-28759-x.

Association analysis for resistance to Striga hermonthica in diverse tropical maize inbred lines.

Sci Rep. 2021 Dec 17;11(1):24193. doi: 10.1038/s41598-021-03566-4.

Papua New Guinean Genomes Reveal the Complex Settlement of North Sahul.

Mol Biol Evol. 2021 Oct 27;38(11):5107-5121. doi: 10.1093/molbev/msab238.

Genetic diversity and inter-trait relationship of tropical extra-early maturing quality protein maize inbred lines under low soil nitrogen stress.

PLoS One. 2021 Jun 11;16(6):e0252506. doi: 10.1371/journal.pone.0252506. eCollection 2021.

Genomic Analyses of Unveil Helmeted Guinea Fowl (Numida meleagris) Domestication in West Africa.

Genome Biol Evol. 2021 Jun 8;13(6). doi: 10.1093/gbe/evab090.

Genomic biosurveillance of forest invasive alien enemies: A story written in code.

Evol Appl. 2019 Sep 10;13(1):95-115. doi: 10.1111/eva.12853. eCollection 2020 Jan.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

利用基因数据进行人群识别。

Population identification using genetic data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译