Al Arab Marwa, Bernt Matthias, Höner Zu Siederdissen Christian, Tout Kifah, Stadler Peter F
Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.
Faculty of Sciences I, Lebanese University, Hadath, Beirut, Lebanon.
Algorithms Mol Biol. 2017 Aug 23;12:22. doi: 10.1186/s13015-017-0113-0. eCollection 2017.
Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision.
We define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence , two fragments , and are produced that may exactly fit together to match , leave a gap of deleted DNA between and , or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications.
The partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.
基因组DNA经常发生基因顺序重排,可通过比较两个DNA序列来定位这种重排。在线粒体基因组中,可能存在不同的机制在起作用,其中至少一些机制涉及明显断点位置周围序列的重复。我们假设这些不同的基因组重排机制会留下独特的序列印记。为了研究此类效应,精确确定断点位置非常重要。
我们定义了一个部分局部序列比对问题,该问题假设在一个序列重排后,会产生两个片段,它们可能恰好拼接在一起以匹配某个序列,在它们之间留下一段缺失DNA的间隙,或者相互重叠。我们表明,这个比对问题可以通过动态规划在立方空间和时间内解决。我们应用这种新方法来评估动物线粒体基因组的重排,发现这些事件中惊人的一大部分涉及局部序列重复。
部分局部序列比对方法是研究基因组重排事件机制的有效方法。虽然这里仅应用于线粒体基因组,但没有理由不能用该方法来研究核基因组中的重排。