Guyeux Christophe, Al-Nuaimi Bashar, AlKindy Bassam, Couchot Jean-François, Salomon Michel
FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Science Department, Univ. Bourgogne Franche-Comté (UBFC), 16 Route de Gray, Besançon, 25000, France.
Department of Computer Science, Diyala University, Diyala, 32001, Iraq.
BMC Syst Biol. 2018 Nov 20;12(Suppl 5):100. doi: 10.1186/s12918-018-0618-2.
To reconstruct the evolution history of DNA sequences, novel models of increasing complexity regarding the number of free parameters taken into account in the sequence evolution, as well as faster and more accurate algorithms, and statistical and computational methods, are needed. More particularly, as the principal forces that have led to major structural changes are genome rearrangements (such as translocations, fusions, and so on), understanding their underlying mechanisms, among other things via the ancestral genome reconstruction, are essential. In this problem, since finding the ancestral genomes that minimize the number of rearrangements in a phylogenetic tree is known to be NP-hard for three or more genomes, heuristics are commonly chosen to obtain approximations of the exact solution. The aim of this work is to show that another path is possible.
Various algorithms and software already deal with the difficult nature of the problem of reconstruction of the ancestral genome, but they do not function with precision, in particular when indels or single nucleotide polymorphisms fall into repeated sequences. In this article, and despite the theoretical NP-hardness of the ancestral reconstruction problem, we show that an exact solution can be found in practice in various cases, encompassing organelles and some bacteria. A practical example proves that an accurate reconstruction, which also allows to highlight homoplasic events, can be obtained. This is illustrated by the reconstruction of ancestral genomes of two bacterial pathogens, belonging in Mycobacterium and Brucella genera.
By putting together automatically reconstructed ancestral regions with handmade ones for problematic cases, we show that an accurate reconstruction of ancestors of the Brucella genus and of the Mycobacterium tuberculosis complex is possible. By doing so, we are able to investigate the evolutionary history of each pathogen by computing their common ancestors. They can be investigated extensively, by studying the gene content evolution over time, the resistance acquisition, and the impacts of mobile elements on genome plasticity.
为了重建DNA序列的进化历史,需要构建关于序列进化中所考虑的自由参数数量的、复杂度不断增加的新模型,以及更快、更准确的算法和统计与计算方法。更具体地说,由于导致主要结构变化的主要力量是基因组重排(如易位、融合等),通过祖先基因组重建等方式理解其潜在机制至关重要。在这个问题中,由于已知对于三个或更多基因组而言,在系统发育树中找到使重排数量最小化的祖先基因组是NP难问题,因此通常选择启发式方法来获得精确解的近似值。这项工作的目的是表明还有另一条可行的途径。
各种算法和软件已经在处理祖先基因组重建问题的困难本质,但它们的运行并不精确,特别是当插入缺失或单核苷酸多态性落入重复序列时。在本文中,尽管祖先重建问题在理论上是NP难的,但我们表明在各种情况下,包括细胞器和一些细菌,在实践中都可以找到精确解。一个实际例子证明,可以获得准确的重建结果,这也能够突出同源性事件。这通过对属于分枝杆菌属和布鲁氏菌属的两种细菌病原体的祖先基因组重建得到了说明。
通过将自动重建的祖先区域与针对有问题的情况手工制作的区域结合起来,我们表明可以对布鲁氏菌属和结核分枝杆菌复合体的祖先进行准确重建。通过这样做,我们能够通过计算它们的共同祖先来研究每种病原体的进化历史。通过研究基因含量随时间的演变、抗性的获得以及移动元件对基因组可塑性的影响,可以对它们进行广泛的研究。