Zhang Hongmei, Gu Xun
University of West Florida, USA.
Stat Appl Genet Mol Biol. 2004;3:Article31. doi: 10.2202/1544-6115.1060. Epub 2004 Nov 14.
With the rapid growth of entire genome data, reconstructing the phylogenetic relationship among different genomes has become a hot topic in comparative genomics. Maximum likelihood approach is one of the various approaches, and has been very successful. However, there is no reported study for any applications in the genome tree-making mainly due to the lack of an analytical form of a probability model and/or the complicated calculation burden. In this paper we studied the mathematical structure of the stochastic model of genome evolution, and then developed a simplified likelihood function for observing a specific phylogenetic pattern under four genome situation using gene content information. We use the maximum likelihood approach to identify phylogenetic trees. Simulation results indicate that the proposed method works well and can identify trees with a high correction rate. Real data application provides satisfied results. The approach developed in this paper can serve as the basis for reconstructing phylogenies of more than four genomes.
随着全基因组数据的快速增长,重建不同基因组之间的系统发育关系已成为比较基因组学中的一个热门话题。最大似然法是众多方法之一,并且已经非常成功。然而,主要由于概率模型缺乏解析形式和/或计算负担复杂,目前尚无关于其在基因组树构建中的任何应用的报道研究。在本文中,我们研究了基因组进化随机模型的数学结构,然后利用基因含量信息开发了一个简化的似然函数,用于观察四种基因组情况下的特定系统发育模式。我们使用最大似然法来识别系统发育树。模拟结果表明,所提出的方法效果良好,能够以高准确率识别树。实际数据应用也提供了令人满意的结果。本文所开发的方法可为重建四个以上基因组的系统发育提供基础。