Devillers Hugo, Chiapello Hélène, Schbath Sophie, Karoui Meriem El
Mathématique, Informatique et Génome, INRA, UR1077, Jouy-en-Josas, France.
J Comput Biol. 2011 Sep;18(9):1155-65. doi: 10.1089/cmb.2011.0115.
Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate, among other things, bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The two scores described in this article provide useful information and are easy to implement, and their interpretation is intuitive. We show that they are suited to discriminate between robust and non-robust segmentations when genome aligners such as MAUVE and MGA are used.
对密切相关的细菌基因组进行比较后发现,存在高度保守的序列形成一个“主干”,该主干被众多不太保守的DNA片段打断。将细菌基因组划分为主干区域和可变区域对于研究细菌基因组进化等方面特别有用。已经设计了几种软件工具来比较完整的细菌染色体,并且有一些在线数据库存储了预先计算好的基因组比较结果。然而,可用于评估这些软件工具的可靠性以及比较使用它们所获得结果的统计方法非常少。为了填补这一空白,我们开发了两个局部得分来衡量细菌基因组划分的稳健性。我们的方法使用基于对比较基因组进行随机扰动的模拟程序。本文描述的这两个得分提供了有用的信息且易于实现,其解释也很直观。我们表明,当使用诸如MAUVE和MGA等基因组比对工具时,它们适合区分稳健和不稳健的划分。