Huson Daniel H, Steel Mike
Center for Bioinformatics, Tübingen University, Sand 14, 72076 Tübingen, Germany.
Bioinformatics. 2004 Sep 1;20(13):2044-9. doi: 10.1093/bioinformatics/bth198. Epub 2004 Mar 25.
Comparing gene content between species can be a useful approach for reconstructing phylogenetic trees. In this paper, we derive a maximum-likelihood estimation of evolutionary distance between species under a simple model of gene genesis and gene loss. Using simulated data on a biological tree with 107 taxa (and on a number of randomly generated trees), we compare the accuracy of tree reconstruction using this ML distance measure to an earlier ad hoc distance. We then compare these distance-based approaches to a character-based tree reconstruction method (Dollo parsimony) which seems well suited to the analysis of gene content data. To simplify simulations, we give a formal proof of the well-known 'fact' that the Dollo parsimony score is independent of the choice of root. Our results show a consistent trend, with the character-based method and ML distance measure outperforming the earlier ad hoc distance method.
http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html
比较物种间的基因含量是重建系统发育树的一种有用方法。在本文中,我们在一个简单的基因产生和基因丢失模型下,推导了物种间进化距离的最大似然估计。使用具有107个分类单元的生物树(以及一些随机生成的树)上的模拟数据,我们将使用这种最大似然距离度量进行树重建的准确性与早期的特设距离进行比较。然后,我们将这些基于距离的方法与一种基于特征的树重建方法(多洛简约法)进行比较,该方法似乎非常适合分析基因含量数据。为了简化模拟,我们给出了一个众所周知的“事实”的形式证明,即多洛简约分数与根的选择无关。我们的结果显示出一致的趋势,基于特征的方法和最大似然距离度量优于早期的特设距离方法。
http://www.ab.informatik.uni-tuebingen.de/software/genecontent/welcome_en.html