利用高密度单倍型数据推断种群结构。

Inference of population structure using dense haplotype data.

机构信息

Department of Mathematics, University of Bristol, Bristol, United Kingdom.

出版信息

PLoS Genet. 2012 Jan;8(1):e1002453. doi: 10.1371/journal.pgen.1002453. Epub 2012 Jan 26.

DOI:10.1371/journal.pgen.1002453

PMID:22291602

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3266881/

Abstract

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.

摘要

全基因组高密度变异数据的出现为我们提供了一个前所未有的机会来详细研究人类的祖先，然而也带来了新的统计挑战。我们提出了一种新的推断框架，旨在有效地捕捉由单倍型相似性模式提供的群体结构信息。样本中的每个个体依次被视为接受者，其染色体使用其他个体捐献的 DNA 块进行重建。这种“染色体绘画”的结果可以总结为“共祖矩阵”，它直接揭示了个体之间祖先关系的关键信息。如果将标记视为独立的，我们将表明，该矩阵几乎以统一的方式完全捕获了标准主成分分析（PCA）和基于模型的方法（如 STRUCTURE）所使用的信息。此外，当标记处于连锁不平衡时，该矩阵将跨连续标记的信息组合起来，以增加使用 PCA 识别细微群体结构的能力。与此同时，我们开发了一种有效的基于模型的方法来使用该矩阵识别离散群体，该方法在可解释性方面优于 PCA，在速度、可分离群体数量和对细微群体结构的敏感性方面优于现有的聚类算法。我们分析了 938 个人和 641000 个标记的人类基因组多样性面板数据，并识别出 226 个反映大陆、区域、局部和家庭尺度差异的群体。我们提供了多条证据表明，虽然许多方法在高度分化的群体中捕捉到了相似的信息，但人类群体中更细微的群体结构始终以比当前可用的地理标签更精细的水平存在，并且只能通过基于单倍型的方法来捕捉。本文使用的软件，ChromoPainter 和 fineSTRUCTURE，可从 http://www.paintmychromosomes.com/ 获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f0b/3266881/9c3ab2f98cfa/pgen.1002453.g001.jpg

相似文献

Inference of population structure using dense haplotype data.利用高密度单倍型数据推断种群结构。

PLoS Genet. 2012 Jan;8(1):e1002453. doi: 10.1371/journal.pgen.1002453. Epub 2012 Jan 26.

FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop：一种利用遗传数据推断洲际血统的快速主成分衍生方法。

BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.

Haplotype information and linkage disequilibrium mapping for single nucleotide polymorphisms.单核苷酸多态性的单倍型信息与连锁不平衡定位

Genome Res. 2003 Sep;13(9):2112-7. doi: 10.1101/gr.586803.

Sparse haplotype-based fine-scale local ancestry inference at scale reveals recent selection on immune responses.大规模基于稀疏单倍型的精细尺度本地祖先推断揭示了对免疫反应的近期选择。

Nat Commun. 2025 Mar 20;16(1):2742. doi: 10.1038/s41467-025-57601-3.

Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map.利用高密度共识图谱对优质硬粒小麦育种系中粗粒小麦粉和意大利面颜色进行单标记和单倍型关联分析

PLoS One. 2017 Jan 30;12(1):e0170941. doi: 10.1371/journal.pone.0170941. eCollection 2017.

Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data.利用跨越多个单核苷酸多态性的读取信息，从测序数据中推断单倍型。

Bioinformatics. 2013 Sep 15;29(18):2245-52. doi: 10.1093/bioinformatics/btt386. Epub 2013 Jul 3.

Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes.通过单核苷酸多态性单倍型的分支分析进行连锁不平衡作图。

Am J Hum Genet. 2004 Jul;75(1):35-43. doi: 10.1086/422174. Epub 2004 May 13.

Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations.1000 基因组数据中信息位点的基因型暗示了人类群体的进化和混合。

Sci Rep. 2021 Sep 7;11(1):17741. doi: 10.1038/s41598-021-97129-2.

Haplotype block partitioning and tag SNP selection using genotype data and their applications to association studies.利用基因型数据进行单倍型块划分和标签单核苷酸多态性选择及其在关联研究中的应用。

Genome Res. 2004 May;14(5):908-16. doi: 10.1101/gr.1837404. Epub 2004 Apr 12.

SURFBAT: a surrogate family based association test building on large imputation reference panels.SURFBAT：一种基于大型插补参考面板构建的基于替代家族的关联测试。

G3 (Bethesda). 2025 Apr 17;15(4). doi: 10.1093/g3journal/jkae287.

引用本文的文献

Ancient genomes provide evidence of demographic shift to Slavic-associated groups in Moravia.古代基因组为摩拉维亚地区向与斯拉夫人相关群体的人口结构转变提供了证据。

Genome Biol. 2025 Sep 3;26(1):259. doi: 10.1186/s13059-025-03700-9.

The global genomic landscape of hypervirulent from 1932 to 2021.1932年至2021年高毒力菌株的全球基因组概况。

mLife. 2025 Aug 24;4(4):378-396. doi: 10.1002/mlf2.70029. eCollection 2025 Aug.

Genetic Consequences of Tree Planting Versus Natural Colonisation: Implications for Afforestation Programmes in the United Kingdom.植树造林与自然定居的遗传后果：对英国造林计划的启示

Evol Appl. 2025 Aug 27;18(8):e70146. doi: 10.1111/eva.70146. eCollection 2025 Aug.

Population structure of three New Zealand crested penguins identifies current conservation challenges for the Fiordland penguin/tawaki, erect-crested penguin, and eastern rockhopper penguin.三种新西兰冠企鹅的种群结构揭示了峡湾企鹅/塔瓦基企鹅、竖冠企鹅和东部跳岩企鹅当前面临的保护挑战。

PLoS One. 2025 Aug 27;20(8):e0329545. doi: 10.1371/journal.pone.0329545. eCollection 2025.

The History of the Panmictic Population Concept and Its Legacy in Contemporary Population Genetics.随机交配群体概念的历史及其在当代群体遗传学中的遗产。

Ann Hum Genet. 2025 Sep;89(5):274-284. doi: 10.1111/ahg.70015. Epub 2025 Jul 28.

Generating realistic artificial human genomes using adversarial autoencoders.使用对抗自编码器生成逼真的人工人类基因组。

NAR Genom Bioinform. 2025 Jul 24;7(3):lqaf101. doi: 10.1093/nargab/lqaf101. eCollection 2025 Sep.

A draft UAE-based Arab pangenome reference.一份基于阿联酋的阿拉伯泛基因组参考草案。

Nat Commun. 2025 Jul 24;16(1):6747. doi: 10.1038/s41467-025-61645-w.

Power and Limitations of Inferring Genetic Ancestry.推断遗传血统的能力与局限性

Ann Hum Genet. 2025 Sep;89(5):264-273. doi: 10.1111/ahg.70007. Epub 2025 Jul 15.

Recomb-Mix: fast and accurate local ancestry inference.Recomb-Mix：快速准确的局部祖先推断

Bioinformatics. 2025 Jul 1;41(Supplement_1):i180-i188. doi: 10.1093/bioinformatics/btaf227.

AncestryGeni: a novel genetic ancestry classification pipeline for small and noisy sequence data.AncestryGeni：一种用于小且有噪声序列数据的新型遗传血统分类流程。

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf391.

本文引用的文献

Combining markers into haplotypes can improve population structure inference.将标记组合成单倍型可以提高群体结构推断。

Genetics. 2012 Jan;190(1):159-74. doi: 10.1534/genetics.111.131136. Epub 2011 Aug 25.

Whole-genome molecular haplotyping of single cells.单细胞全基因组分子单体型分析。

Nat Biotechnol. 2011 Jan;29(1):51-7. doi: 10.1038/nbt.1739. Epub 2010 Dec 19.

Haplotype-resolved genome sequencing of a Gujarati Indian individual.单体型解析的古吉拉特邦印度个体基因组测序。

Nat Biotechnol. 2011 Jan;29(1):59-63. doi: 10.1038/nbt.1740. Epub 2010 Dec 19.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.主成分判别分析：一种用于分析遗传结构群体的新方法。

BMC Genet. 2010 Oct 15;11:94. doi: 10.1186/1471-2156-11-94.

Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.基于稀疏因子分析的人口结构分析：统一框架与新方法

PLoS Genet. 2010 Sep 16;6(9):e1001117. doi: 10.1371/journal.pgen.1001117.

Population structure with localized haplotype clusters.具有局部单倍型簇的种群结构。

Genetics. 2010 Aug;185(4):1337-44. doi: 10.1534/genetics.110.116681. Epub 2010 May 10.

A genealogical interpretation of principal components analysis.主成分分析的谱系学解释

PLoS Genet. 2009 Oct;5(10):e1000686. doi: 10.1371/journal.pgen.1000686. Epub 2009 Oct 16.

Reconstructing Indian population history.重构印度人口历史。

Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365.

Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用高密度单倍型数据推断种群结构。

Inference of population structure using dense haplotype data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献