Suppr超能文献

迭代剪枝主成分分析提高高度结构化群体的分辨率。

Iterative pruning PCA improves resolution of highly structured populations.

机构信息

BIOTEC 113 Thailand Science Park, Paholyothin Road, Klong 1, Klong Luang, Pathumtani 12120, Thailand.

出版信息

BMC Bioinformatics. 2009 Nov 23;10:382. doi: 10.1186/1471-2105-10-382.

Abstract

BACKGROUND

Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming.

RESULTS

A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods.

CONCLUSION

The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.

摘要

背景

由于各种进化因素的影响,人群中的个体存在非随机的遗传变异模式。因此,种群被构建成具有遗传差异的亚种群。随着基因型数据集变得越来越大,正确估计亚种群的数量并将个体分配到它们中变得越来越困难。因此,计算效率高的非参数、主要基于主成分分析(PCA)的方法越来越多地被用于种群结构分析。目前基于 PCA 的方法可以准确地检测结构;然而,在解析亚种群和将个体分配给它们方面的准确性还有待提高。当亚种群彼此密切相关时,它们在 PCA 空间中重叠,呈现出一种聚集体。当数据集中的一些亚种群在遗传上与其他亚种群相距甚远时,这个问题会更加严重。我们提出了一种新的基于 PCA 的框架来解决这个问题。

结果

我们开发了一种新的基于 PCA 的群体结构分析算法,称为迭代修剪 PCA(ipPCA),它可以将个体分配到亚种群中,并推断出存在的亚种群总数。分析了具有不同结构程度的模拟和真实群体数据集的基因型数据。对于具有简单结构的数据集,ipPCA 对个体的亚群分配与 STRUCTURE、BAPS 和 AWclust 算法基本一致。另一方面,高度结构化的种群包含许多密切相关的亚种群,只有 ipPCA 才能准确解析,而其他方法则不行。

结论

该算法计算效率高,不受数据集复杂性的限制。这种系统的亚群分配方法不需要事先的群体标签,当在包含被认为属于同质群体的个体的数据集遇到隐分层时,这可能是有利的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76a6/2790469/7944c0239531/1471-2105-10-382-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验