Suppr超能文献

一种针对基因聚类问题的快速似然解。

A fast likelihood solution to the genetic clustering problem.

作者信息

Beugin Marie-Pauline, Gayet Thibault, Pontier Dominique, Devillard Sébastien, Jombart Thibaut

机构信息

Univ Lyon Laboratoire de Biométrie et Biologie Evolutive CNRS Université Claude Bernard Lyon 1 Villeurbanne France.

ANTAGENE, Animal Genomics Laboratory La Tour de Salvagny France.

出版信息

Methods Ecol Evol. 2018 Apr;9(4):1006-1016. doi: 10.1111/2041-210X.12968. Epub 2018 Jan 30.

Abstract

The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model-based methods, which are usually computer-intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster.Here, we introduce , a fast maximum-likelihood solution to the genetic clustering problem, which allies the advantages of both model-based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation-Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness-of-fit statistics can also be used to guide the choice of the retained number of clusters.Using extensive simulations, we show that performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model-based methods. We also illustrate how can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset. is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co-dominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer-efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations.

摘要

在一系列依赖遗传数据分析的领域中,如分子生态学、保护生物学和微生物学,对自然种群中基因簇的研究是一个普遍存在的问题。通常,基因簇被定义为不同的随机交配种群,或杂交背景下的亲本群体。已经开发出两种识别此类基因簇的方法:基于模型的方法,通常计算量较大,但产生的结果可以根据明确的群体遗传模型进行解释;几何方法,较难解释,但速度明显更快。在此,我们介绍一种针对基因聚类问题的快速最大似然解,它结合了基于模型的方法和几何方法的优点。我们的方法依赖于使用几何方法和快速似然优化(使用期望最大化(EM)算法)的组合来最大化固定数量随机交配种群的似然性。它可用于将基因型分配到种群,并可选择识别两个亲本群体之间的各种类型的杂种。还可以使用几种拟合优度统计量来指导所保留的簇数的选择。通过广泛的模拟,我们表明该方法在基因聚类以及杂种检测方面与当前的金标准表现相当,在识别多次回交后的杂种方面具有一些优势,同时比其他基于模型的方法快几个数量级。我们还说明了该方法如何用于识别最优的簇数,并随后将个体分配到从经验微卫星数据集中模拟的各种杂种类别。该方法在免费软件R的adegenet包中实现,因此很容易集成到现有的遗传数据分析管道中。它可以应用于任何类型的共显性标记,并且可以很容易地扩展到更复杂的模型,例如包括不同的倍性水平。鉴于其灵活性和计算效率,它为研究自然种群遗传多样性的现有工具箱提供了有用的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/634a/5993310/345bd351da72/MEE3-9-1006-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验