Department of Computer Science, School of Information Technology and Engineering, University of Ottawa, Ottawa K1N 6N5, Canada.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Oct-Dec;7(4):579-87. doi: 10.1109/TCBB.2010.66.
There has been a trend in increasing the phylogenetic scope of genome sequencing while decreasing the quality of the published sequence for each genome. With reduced finishing effort, there is an increasing number of genomes being published in contig form. Rearrangement algorithms, including gene order-based phylogenetic tools, require whole genome data on gene order, segment order, or some other marker order. Items whose chromosomal location is unknown cannot be part of the input. The question we address here is, for gene order-based phylogenetic analysis, how can we use rearrangement algorithms to handle genomes available in contig form only? Our suggestion is to use the contigs directly in the rearrangement algorithms as if they were chromosomes, while making a number of corrections, e.g., we correct for the number of extra fusion/fission operations required to make contigs comparable to full assemblies. We model the relationship between contig number and genomic distance, and estimate the parameters of this model using insect genome data. With this model, we use distance matrix methods to reconstruct the phylogeny based on genomic distance and numbers of contigs. We compare this with methods to reconstruct ancestral gene orders using uncorrected contig data.
在增加基因组测序的系统发育范围的同时,每个基因组的发表序列质量却在下降,这已经成为一种趋势。由于完成工作量的减少,越来越多的基因组以连续体的形式发表。重排算法,包括基于基因顺序的系统发育工具,需要关于基因顺序、片段顺序或其他一些标记顺序的全基因组数据。染色体位置未知的项目不能作为输入的一部分。我们在这里要解决的问题是,对于基于基因顺序的系统发育分析,我们如何仅使用连续体形式的基因组来使用重排算法?我们的建议是在重排算法中直接使用连续体,就好像它们是染色体一样,同时进行一些修正,例如,我们修正了使连续体与完整组装相比所需的额外融合/裂变操作的数量。我们对连续体数量与基因组距离之间的关系进行建模,并使用昆虫基因组数据估计该模型的参数。有了这个模型,我们可以使用距离矩阵方法根据基因组距离和连续体数量重建系统发育。我们将这与使用未修正的连续体数据重建祖先基因顺序的方法进行了比较。