Avdeyev Pavel, Jiang Shuai, Aganezov Sergey, Hu Fei, Alekseyev Max A
1 Computational Biology Institute & Department of Mathematics, The George Washington University , Washington, DC, U.S.A.
2 Department of Computer Science and Engineering, University of South Carolina , Columbia, SC, U.S.A.
J Comput Biol. 2016 Mar;23(3):150-64. doi: 10.1089/cmb.2015.0160. Epub 2016 Feb 17.
Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools.
由于大多数显著的基因组变化是由基因组重排以及基因复制和得失事件引起的,因此了解它们的机制并重建给定基因组的祖先基因组变得至关重要。即使在三个基因组的“最简单”情况下,这个问题也被证明是NP完全问题,因此需要启发式而非精确的算法解决方案。与此同时,更多数量的输入基因组实际上可能在实践中简化问题,正如之前用MGRA(一种用于重建多个基因组祖先基因组的先进软件工具)所说明的那样。MGRA和其他类似工具的关键障碍之一是断点重用的存在,即在进化过程中相同的断点区域被几种不同的基因组重排打断。此外,此类工具通常限于由相同基因组成的基因组,每个基因在每个基因组中仅存在一个拷贝。这种限制使得这些工具不适用于许多生物学数据集,并降低了不同数据集中祖先重建的分辨率。我们通过将MGRA算法扩展到基因含量不等的基因组来解决这些缺陷。开发的下一代工具MGRA2可以处理基因得失事件,并在断点重用有限的情况下具有与MGRA相同的独特重建祖先基因组的能力。此外,MGRA2采用了一些新颖的启发式方法来应对更高的断点重用,并处理MGRA无法访问的数据集。在实际实验中,与其他祖先基因组重建工具相比,MGRA2在模拟和真实基因组上表现出卓越的性能。