Suppr超能文献

使用马尔可夫链蒙特卡罗方法进行贝叶斯基因/物种树和解与直系同源分析。

Bayesian gene/species tree reconciliation and orthology analysis using MCMC.

作者信息

Arvestad Lars, Berglund Ann-Charlotte, Lagergren Jens, Sennblad Bengt

机构信息

SBC and Center for Genomics and Bioinforamtics, Karolinska Instituet, SE-171 77, Stockholm, Sweden.

出版信息

Bioinformatics. 2003;19 Suppl 1:i7-15. doi: 10.1093/bioinformatics/btg1000.

Abstract

MOTIVATION

Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orthology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models. For instance, they allow for assessing solution reliability and consideration of alternative solutions in a uniform way. There is also an added benefit in making model assumptions explicit and therefore making model comparisons possible. For orthology analysis, uncertainty has recently been addressed using parsimonious reconciliation combined with bootstrap techniques. However, until now no probabilistic methods have been available.

RESULTS

We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves 'inside' a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch's original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound (Nei et al., 1997) and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. The main algorithmic contribution presented here consists of an algorithm for computing the likelihood of a given reconciliation. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis. The MCMC algorithm has been implemented and, although not yet being in its final form, tests show that it performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

摘要

动机

一般来说,比较基因组学,尤其是直系同源分析,正日益成为基因功能预测的重要组成部分。以前,直系同源分析和比对仅基于简约模型进行。这会舍弃许多合理的解决方案,有时甚至无法找到正确的解决方案。在生物信息学的许多其他领域,概率模型已被证明比简约模型更现实、更强大。例如,它们允许以统一的方式评估解决方案的可靠性并考虑替代解决方案。明确模型假设还有一个额外的好处,即可以进行模型比较。对于直系同源分析,最近通过结合简约比对和自展技术来处理不确定性。然而,到目前为止还没有可用的概率方法。

结果

我们引入了一种基于生死过程的概率基因进化模型,其中基因树在物种树“内部”进化。基于此模型,我们开发了一种工具,能够基于菲奇的原始定义进行实际的直系同源分析,更广泛地用于比对基因树和物种树对。我们的基因进化模型在生物学上是合理的(内等,1997)且直观上有吸引力。我们基于马尔可夫链蒙特卡罗(MCMC)开发了一种贝叶斯分析方法,有助于逼近比对的后验分布。也就是说,给定观察到的基因树,我们可以找到最可能的比对并估计任何比对的概率。这也提供了一种估计一对基因是直系同源物的概率的方法。这里提出的主要算法贡献包括一种计算给定比对似然性的算法。据我们所知,这是首次成功将在系统发育分析中蓬勃发展的此类概率方法引入比对和直系同源分析。MCMC算法已经实现,虽然尚未最终成型,但测试表明它在合成数据和生物数据上都表现得非常好。使用标准对应关系,我们的结果也适用于等位基因树和生物地理学。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验