Chauve Cedric, Tannier Eric
Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada.
PLoS Comput Biol. 2008 Nov;4(11):e1000234. doi: 10.1371/journal.pcbi.1000234. Epub 2008 Nov 28.
The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data acquisition.
从现存物种之间的同源性重建祖先基因组结构和基因顺序是一个长期存在的问题,细胞遗传学家和生物信息学家都在研究。最近,一系列论文对这两种方法进行了比较研究和讨论,有时对于这两种方法的性能存在不同观点。我们描述了一种从现存基因组中的保守共线性重建祖先基因组片段的通用方法框架。我们表明,从计算角度来看,这个问题与染色体的物理图谱自然相关,并受益于在这个范围内开发的组合工具。我们将这个框架发展成一种新的重建方法,该方法考虑具有相似基因内容的保守基因簇,模仿了大多数细胞遗传学研究中使用的原则,尽管是基于不同类型的数据。我们将其实现并应用于哺乳动物基因组数据集。我们与其他用于祖先基因组片段重建的生物信息学方法进行了深入的理论和实验比较。我们表明我们提出的方法是稳定可靠的:它使用不同分辨率水平的几种数据给出收敛结果,并且所有预测的祖先区域都有充分的支持。结果最终与细胞遗传学研究非常接近。这表明祖先基因组重建方法的比较应该包括方法的算法方面以及数据获取中的学科差异。