Suppr超能文献

主成分判别分析:一种用于分析遗传结构群体的新方法。

Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

机构信息

MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College Faculty of Medicine, St Mary's Campus, Norfolk Place, London W21PG, UK.

出版信息

BMC Genet. 2010 Oct 15;11:94. doi: 10.1186/1471-2156-11-94.

Abstract

BACKGROUND

The dramatic progress in sequencing technologies offers unprecedented prospects for deciphering the organization of natural populations in space and time. However, the size of the datasets generated also poses some daunting challenges. In particular, Bayesian clustering algorithms based on pre-defined population genetics models such as the STRUCTURE or BAPS software may not be able to cope with this unprecedented amount of data. Thus, there is a need for less computer-intensive approaches. Multivariate analyses seem particularly appealing as they are specifically devoted to extracting information from large datasets. Unfortunately, currently available multivariate methods still lack some essential features needed to study the genetic structure of natural populations.

RESULTS

We introduce the Discriminant Analysis of Principal Components (DAPC), a multivariate method designed to identify and describe clusters of genetically related individuals. When group priors are lacking, DAPC uses sequential K-means and model selection to infer genetic clusters. Our approach allows extracting rich information from genetic data, providing assignment of individuals to groups, a visual assessment of between-population differentiation, and contribution of individual alleles to population structuring. We evaluate the performance of our method using simulated data, which were also analyzed using STRUCTURE as a benchmark. Additionally, we illustrate the method by analyzing microsatellite polymorphism in worldwide human populations and hemagglutinin gene sequence variation in seasonal influenza.

CONCLUSIONS

Analysis of simulated data revealed that our approach performs generally better than STRUCTURE at characterizing population subdivision. The tools implemented in DAPC for the identification of clusters and graphical representation of between-group structures allow to unravel complex population structures. Our approach is also faster than Bayesian clustering algorithms by several orders of magnitude, and may be applicable to a wider range of datasets.

摘要

背景

测序技术的巨大进步为破译自然种群的时空组织提供了前所未有的前景。然而,产生的数据量也带来了一些严峻的挑战。特别是,基于预定义种群遗传学模型(如 STRUCTURE 或 BAPS 软件)的贝叶斯聚类算法可能无法处理这种前所未有的数据量。因此,需要采用计算量较小的方法。多元分析似乎特别有吸引力,因为它们专门用于从大型数据集提取信息。不幸的是,目前可用的多元方法仍然缺乏研究自然种群遗传结构所需的一些基本特征。

结果

我们介绍了判别主成分分析(DAPC),这是一种用于识别和描述遗传相关个体聚类的多元方法。当缺乏群体先验知识时,DAPC 使用顺序 K-均值和模型选择来推断遗传聚类。我们的方法允许从遗传数据中提取丰富的信息,提供个体到群体的分配、对种群分化的直观评估以及个体等位基因对种群结构的贡献。我们使用模拟数据评估了我们的方法的性能,该方法也被用作基准的 STRUCTURE 进行了分析。此外,我们还通过分析全球人类群体的微卫星多态性和季节性流感的血凝素基因序列变异来说明该方法。

结论

对模拟数据的分析表明,我们的方法在描述种群细分方面通常比 STRUCTURE 表现更好。DAPC 中用于识别聚类和图形表示群体间结构的工具允许揭示复杂的种群结构。我们的方法比贝叶斯聚类算法快几个数量级,并且可能适用于更广泛的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3816/2973851/c46201ad2c89/1471-2156-11-94-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验