迭代剪枝主成分分析提高高度结构化群体的分辨率。

Iterative pruning PCA improves resolution of highly structured populations.

机构信息

BIOTEC 113 Thailand Science Park, Paholyothin Road, Klong 1, Klong Luang, Pathumtani 12120, Thailand.

出版信息

BMC Bioinformatics. 2009 Nov 23;10:382. doi: 10.1186/1471-2105-10-382.

DOI:10.1186/1471-2105-10-382

PMID:19930644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2790469/

Abstract

BACKGROUND

Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.

RESULTS

A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.

CONCLUSION

The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.

摘要

背景

由于各种进化因素的影响，人群中的个体存在非随机的遗传变异模式。因此，种群被构建成具有遗传差异的亚种群。随着基因型数据集变得越来越大，正确估计亚种群的数量并将个体分配到它们中变得越来越困难。因此，计算效率高的非参数、主要基于主成分分析（PCA）的方法越来越多地被用于种群结构分析。目前基于 PCA 的方法可以准确地检测结构；然而，在解析亚种群和将个体分配给它们方面的准确性还有待提高。当亚种群彼此密切相关时，它们在 PCA 空间中重叠，呈现出一种聚集体。当数据集中的一些亚种群在遗传上与其他亚种群相距甚远时，这个问题会更加严重。我们提出了一种新的基于 PCA 的框架来解决这个问题。

结果

我们开发了一种新的基于 PCA 的群体结构分析算法，称为迭代修剪 PCA（ipPCA），它可以将个体分配到亚种群中，并推断出存在的亚种群总数。分析了具有不同结构程度的模拟和真实群体数据集的基因型数据。对于具有简单结构的数据集，ipPCA 对个体的亚群分配与 STRUCTURE、BAPS 和 AWclust 算法基本一致。另一方面，高度结构化的种群包含许多密切相关的亚种群，只有 ipPCA 才能准确解析，而其他方法则不行。

结论

该算法计算效率高，不受数据集复杂性的限制。这种系统的亚群分配方法不需要事先的群体标签，当在包含被认为属于同质群体的个体的数据集遇到隐分层时，这可能是有利的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76a6/2790469/7944c0239531/1471-2105-10-382-1.jpg

相似文献

Iterative pruning PCA improves resolution of highly structured populations.迭代剪枝主成分分析提高高度结构化群体的分辨率。

BMC Bioinformatics. 2009 Nov 23;10:382. doi: 10.1186/1471-2105-10-382.

Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure.结合迭代修剪主成分分析和结构对大型高度分层人群数据集进行研究。

BMC Bioinformatics. 2011 Jun 23;12:255. doi: 10.1186/1471-2105-12-255.

IPCAPS: an R package for iterative pruning to capture population structure.IPCAPS：一个用于迭代剪枝以捕捉群体结构的R包。

Source Code Biol Med. 2019 Mar 20;14:2. doi: 10.1186/s13029-019-0072-6. eCollection 2019.

PLoS Genet. 2007 Sep;3(9):1672-86. doi: 10.1371/journal.pgen.0030160.

PCA-based population structure inference with generic clustering algorithms.基于主成分分析的群体结构推断与通用聚类算法

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S73. doi: 10.1186/1471-2105-10-S1-S73.

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations.BAPS软件中用于学习群体遗传结构的增强贝叶斯建模。

BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539.

SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.SHIPS：遗传研究中用于推断群体结构的谱层次聚类。

PLoS One. 2012;7(10):e45685. doi: 10.1371/journal.pone.0045685. Epub 2012 Oct 12.

A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis.一种基于混合模型和主成分分析的群体结构似然无估计方法。

Genetics. 2019 Aug;212(4):1009-1029. doi: 10.1534/genetics.119.302159. Epub 2019 Apr 26.

SHEsisPCA: a GPU-based software to correct for population stratification that efficiently accelerates the process for handling genome-wide datasets.SHEsisPCA：一种基于 GPU 的用于校正群体分层的软件，它可以有效地加速处理全基因组数据集的过程。

J Genet Genomics. 2015 Aug 20;42(8):445-53. doi: 10.1016/j.jgg.2015.06.007. Epub 2015 Jul 9.

Selection of microbial biomarkers with genetic algorithm and principal component analysis.遗传算法和主成分分析选择微生物生物标志物。

BMC Bioinformatics. 2019 Dec 10;20(Suppl 6):413. doi: 10.1186/s12859-019-3001-4.

引用本文的文献

Assessing the power of principal components and wright's fixation index analyzes applied to reveal the genome-wide genetic differences between herds of Holstein cows.评估主成分和 Wright 的固定指数分析的功效，应用于揭示荷斯坦奶牛群体间的全基因组遗传差异。

BMC Genet. 2020 Apr 28;21(1):47. doi: 10.1186/s12863-020-00848-0.

Ancestry-informative marker (AIM) SNP panel for the Malay population.马来人群的祖先信息标记（AIM）单核苷酸多态性（SNP）面板。

Int J Legal Med. 2020 Jan;134(1):123-134. doi: 10.1007/s00414-019-02184-0. Epub 2019 Nov 23.

A different view on fine-scale population structure in Western African populations.对西非人群中精细尺度人口结构的不同看法。

Hum Genet. 2020 Jan;139(1):45-59. doi: 10.1007/s00439-019-02069-7. Epub 2019 Oct 19.

IPCAPS: an R package for iterative pruning to capture population structure.IPCAPS：一个用于迭代剪枝以捕捉群体结构的R包。

Source Code Biol Med. 2019 Mar 20;14:2. doi: 10.1186/s13029-019-0072-6. eCollection 2019.

Nonparametric approaches for population structure analysis.非参数群体结构分析方法。

Hum Genomics. 2018 May 9;12(1):25. doi: 10.1186/s40246-018-0156-4.

A comparison of DMET Plus microarray and genome-wide technologies by assessing population substructure.通过评估群体亚结构对DMET Plus微阵列技术和全基因组技术进行比较。

Pharmacogenet Genomics. 2016 Apr;26(4):147-153. doi: 10.1097/FPC.0000000000000200.

HaploPOP: a software that improves population assignment by combining markers into haplotypes.HaploPOP：一种通过将标记组合成单倍型来改进群体分配的软件。

BMC Bioinformatics. 2015 Jul 31;16:242. doi: 10.1186/s12859-015-0661-6.

Challenges in analysis and interpretation of microsatellite data for population genetic studies.群体遗传学研究中分析和解释微卫星数据所面临的挑战。

Ecol Evol. 2014 Nov;4(22):4399-428. doi: 10.1002/ece3.1305. Epub 2014 Oct 30.

Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure.从泰国人群基因结构洞察东南亚大陆的人口构成。

PLoS One. 2013 Nov 4;8(11):e79522. doi: 10.1371/journal.pone.0079522. eCollection 2013.

Softwares and methods for estimating genetic ancestry in human populations.人类群体遗传起源估计的软件和方法。

Hum Genomics. 2013 Jan 5;7(1):1. doi: 10.1186/1479-7364-7-1.

本文引用的文献

PCA-based population structure inference with generic clustering algorithms.基于主成分分析的群体结构推断与通用聚类算法

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S73. doi: 10.1186/1471-2105-10-S1-S73.

Accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates.使用主坐标精确推断细微的群体结构（以及其他遗传不连续性）。

PLoS One. 2009;4(1):e4269. doi: 10.1371/journal.pone.0004269. Epub 2009 Jan 27.

Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations.BAPS软件中用于学习群体遗传结构的增强贝叶斯建模。

BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539.

A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation.一项全基因组关联研究确定了与头发颜色和皮肤色素沉着相关的新等位基因。

PLoS Genet. 2008 May 16;4(5):e1000074. doi: 10.1371/journal.pgen.1000074.

Principal component analysis of genetic data.遗传数据的主成分分析

Nat Genet. 2008 May;40(5):491-2. doi: 10.1038/ng0508-491.

A genome-wide association study of psoriasis and psoriatic arthritis identifies new disease loci.一项关于银屑病和银屑病关节炎的全基因组关联研究确定了新的疾病基因座。

PLoS Genet. 2008 Mar 28;4(3):e1000041. doi: 10.1371/journal.pgen.1000041.

Worldwide human relationships inferred from genome-wide patterns of variation.从全基因组变异模式推断全球人类关系。

Science. 2008 Feb 22;319(5866):1100-4. doi: 10.1126/science.1153717.

On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants.关于在全基因组关联研究中使用一般对照样本：遗传匹配突出因果变异。

Am J Hum Genet. 2008 Feb;82(2):453-63. doi: 10.1016/j.ajhg.2007.11.003. Epub 2008 Jan 24.

AWclust: point-and-click software for non-parametric population structure analysis.AWclust：用于非参数群体结构分析的点击式软件。

BMC Bioinformatics. 2008 Jan 31;9:77. doi: 10.1186/1471-2105-9-77.

Analysis and application of European genetic substructure using 300 K SNP information.利用30万单核苷酸多态性信息对欧洲遗传亚结构进行分析与应用

PLoS Genet. 2008 Jan;4(1):e4. doi: 10.1371/journal.pgen.0040004.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

迭代剪枝主成分分析提高高度结构化群体的分辨率。

Iterative pruning PCA improves resolution of highly structured populations.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献