主成分判别分析：一种用于分析遗传结构群体的新方法。

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

机构信息

MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College Faculty of Medicine, St Mary's Campus, Norfolk Place, London W21PG, UK.

出版信息

BMC Genet. 2010 Oct 15;11:94. doi: 10.1186/1471-2156-11-94.

DOI:10.1186/1471-2156-11-94

PMID:20950446

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2973851/

Abstract

BACKGROUND

The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations.

RESULTS

We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza.

CONCLUSIONS

Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

摘要

背景

测序技术的巨大进步为破译自然种群的时空组织提供了前所未有的前景。然而，产生的数据量也带来了一些严峻的挑战。特别是，基于预定义种群遗传学模型（如 STRUCTURE 或 BAPS 软件）的贝叶斯聚类算法可能无法处理这种前所未有的数据量。因此，需要采用计算量较小的方法。多元分析似乎特别有吸引力，因为它们专门用于从大型数据集提取信息。不幸的是，目前可用的多元方法仍然缺乏研究自然种群遗传结构所需的一些基本特征。

结果

我们介绍了判别主成分分析（DAPC），这是一种用于识别和描述遗传相关个体聚类的多元方法。当缺乏群体先验知识时，DAPC 使用顺序 K-均值和模型选择来推断遗传聚类。我们的方法允许从遗传数据中提取丰富的信息，提供个体到群体的分配、对种群分化的直观评估以及个体等位基因对种群结构的贡献。我们使用模拟数据评估了我们的方法的性能，该方法也被用作基准的 STRUCTURE 进行了分析。此外，我们还通过分析全球人类群体的微卫星多态性和季节性流感的血凝素基因序列变异来说明该方法。

结论

对模拟数据的分析表明，我们的方法在描述种群细分方面通常比 STRUCTURE 表现更好。DAPC 中用于识别聚类和图形表示群体间结构的工具允许揭示复杂的种群结构。我们的方法比贝叶斯聚类算法快几个数量级，并且可能适用于更广泛的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3816/2973851/c46201ad2c89/1471-2156-11-94-1.jpg

相似文献

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.主成分判别分析：一种用于分析遗传结构群体的新方法。

BMC Genet. 2010 Oct 15;11:94. doi: 10.1186/1471-2156-11-94.

Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.推断广泛且连续分布的食肉动物的种群遗传结构：以石貂（Martes foina）为例进行研究

PLoS One. 2015 Jul 29;10(7):e0134257. doi: 10.1371/journal.pone.0134257. eCollection 2015.

Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure.结合迭代修剪主成分分析和结构对大型高度分层人群数据集进行研究。

BMC Bioinformatics. 2011 Jun 23;12:255. doi: 10.1186/1471-2105-12-255.

Radiomics-based discriminant analysis of principal components to stratify the treatment response of lung metastases following stereotactic body radiation therapy.基于放射组学的主成分判别分析对立体定向体部放疗后肺转移灶的治疗反应进行分层。

Phys Med. 2024 May;121:103340. doi: 10.1016/j.ejmp.2024.103340. Epub 2024 Apr 9.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Human loci involved in drug biotransformation: worldwide genetic variation, population structure, and pharmacogenetic implications.涉及药物生物转化的人类基因座：全球遗传变异、人群结构和药物遗传学意义。

Hum Genet. 2013 May;132(5):563-77. doi: 10.1007/s00439-013-1268-5. Epub 2013 Jan 26.

The influence of a priori grouping on inference of genetic clusters: simulation study and literature review of the DAPC method.先验分组对遗传聚类推断的影响：DAPC 方法的模拟研究和文献综述。

Heredity (Edinb). 2020 Nov;125(5):269-280. doi: 10.1038/s41437-020-0348-2. Epub 2020 Aug 4.

Erratum: High-Throughput Identification of Resistance to Pseudomonas syringae pv. Tomato in Tomato using Seedling Flood Assay.勘误：利用幼苗浸没法高通量鉴定番茄对丁香假单胞菌 pv.番茄的抗性。

J Vis Exp. 2023 Oct 18(200). doi: 10.3791/6576.

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations.BAPS软件中用于学习群体遗传结构的增强贝叶斯建模。

BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539.

Discriminant analysis of principal components and pedigree assessment of genetic diversity and population structure in a tetraploid potato panel using SNPs.基于 SNP 的主成分判别分析和家系评估在四倍体马铃薯群体中的遗传多样性和群体结构

PLoS One. 2018 Mar 16;13(3):e0194398. doi: 10.1371/journal.pone.0194398. eCollection 2018.

引用本文的文献

Genetic Diversity and Disease Resistance Genes Profiling in Cultivated Genotypes via Molecular Markers.基于分子标记的栽培基因型遗传多样性及抗病基因分析

Plants (Basel). 2025 Sep 5;14(17):2781. doi: 10.3390/plants14172781.

The Impact of Glacial Disturbance History Upon the Genetic Diversity of and in Europe and Implications for Conservation.冰川干扰历史对欧洲[物种名称1]和[物种名称2]遗传多样性的影响及其保护意义。

Ecol Evol. 2025 Sep 6;15(9):e72113. doi: 10.1002/ece3.72113. eCollection 2025 Sep.

The role of Cacao agroforests in the genetic conservation of Cariniana legalis, an emblematic species of the atlantic forest.可可农林复合系统在巴西红木（大西洋森林的标志性物种）遗传保护中的作用。

BMC Ecol Evol. 2025 Sep 1;25(1):89. doi: 10.1186/s12862-025-02418-3.

Contributions to the knowledge of pitvipers (Viperidae, ) in the Democratic People's Republic of Korea: identification, description of specimens, and geographical distribution.对朝鲜蝰蛇（蝰科）知识的贡献：标本鉴定、描述及地理分布

Zookeys. 2025 Aug 19;1249:193-221. doi: 10.3897/zookeys.1249.142916. eCollection 2025.

Joint analysis of phenotypic and molecular data for genetic diversity assessment in extra-early orange maize (Zea Mays L.).联合分析表型和分子数据以评估特早熟橙色玉米（Zea Mays L.）的遗传多样性

BMC Genomics. 2025 Aug 28;26(1):784. doi: 10.1186/s12864-025-11964-5.

Complex Sex Determination in the Grey Mullet Suggested by Individual Whole Genome Sequence Data.个体全基因组序列数据表明鲻鱼存在复杂的性别决定机制。

Animals (Basel). 2025 Aug 20;15(16):2445. doi: 10.3390/ani15162445.

Revealing the potential transmission route of Cnaphalocrocis medinalis granulovirus capable of persistently causing granulosis epidemics.揭示能持续引发颗粒体病流行的稻纵卷叶螟颗粒体病毒的潜在传播途径。

Virus Evol. 2025 Jul 25;11(1):veaf055. doi: 10.1093/ve/veaf055. eCollection 2025.

Twenty years of tuberculosis-driven selection shaped the evolution of the meerkat major histocompatibility complex.二十年由结核病驱动的选择塑造了狐獴主要组织相容性复合体的进化。

Nat Ecol Evol. 2025 Aug 25. doi: 10.1038/s41559-025-02837-x.

Assessment of a microhaplotype panel for human identification and ancestry inference in Brazil.用于巴西人群身份识别和血统推断的微单倍型面板评估

Int J Legal Med. 2025 Aug 22. doi: 10.1007/s00414-025-03573-4.

Footprints of Worldwide Adaptation in Structured Populations of Drosophila melanogaster Through the Expanded DEST 2.0 Genomic Resource.通过扩展的DEST 2.0基因组资源在黑腹果蝇结构化种群中全球适应性的足迹

Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf132.

本文引用的文献

adephylo: new tools for investigating the phylogenetic signal in biological traits.adephylo：用于研究生物特征中的系统发育信号的新工具。

Bioinformatics. 2010 Aug 1;26(15):1907-9. doi: 10.1093/bioinformatics/btq292. Epub 2010 Jun 4.

Picante: R tools for integrating phylogenies and ecology.辣：用于整合系统发育和生态学的 R 工具。

Bioinformatics. 2010 Jun 1;26(11):1463-4. doi: 10.1093/bioinformatics/btq166. Epub 2010 Apr 15.

pegas: an R package for population genetics with an integrated-modular approach.pegas：一个用于群体遗传学的 R 包，采用集成式模块化方法。

Bioinformatics. 2010 Feb 1;26(3):419-20. doi: 10.1093/bioinformatics/btp696. Epub 2010 Jan 14.

A genealogical interpretation of principal components analysis.主成分分析的谱系学解释

PLoS Genet. 2009 Oct;5(10):e1000686. doi: 10.1371/journal.pgen.1000686. Epub 2009 Oct 16.

Evidence that two main bottleneck events shaped modern human genetic diversity.有证据表明，两个主要的瓶颈事件塑造了现代人类遗传多样性。

Proc Biol Sci. 2010 Jan 7;277(1678):131-7. doi: 10.1098/rspb.2009.1473. Epub 2009 Oct 7.

Identifying currents in the gene pool for bacterial populations using an integrative approach.采用综合方法识别细菌种群基因库中的基因流。

PLoS Comput Biol. 2009 Aug;5(8):e1000455. doi: 10.1371/journal.pcbi.1000455. Epub 2009 Aug 7.

The global pattern of gene identity variation reveals a history of long-range migrations, bottlenecks, and local mate exchange: implications for biological race.基因同一性变异的全球模式揭示了长期迁移、瓶颈效应和本地配偶交换的历史：对生物种族的影响。

Am J Phys Anthropol. 2009 May;139(1):35-46. doi: 10.1002/ajpa.20932.

PCA-based population structure inference with generic clustering algorithms.基于主成分分析的群体结构推断与通用聚类算法

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S73. doi: 10.1186/1471-2105-10-S1-S73.

Genetic markers in the playground of multivariate analysis.多变量分析领域中的遗传标记

Heredity (Edinb). 2009 Apr;102(4):330-41. doi: 10.1038/hdy.2008.130. Epub 2009 Jan 21.

Jalview Version 2--a multiple sequence alignment editor and analysis workbench.Jalview 2版本——一个多序列比对编辑器和分析工作台。

Bioinformatics. 2009 May 1;25(9):1189-91. doi: 10.1093/bioinformatics/btp033. Epub 2009 Jan 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

主成分判别分析：一种用于分析遗传结构群体的新方法。

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献