Brito Klairton L, Oliveira Andre R, Alexandrino Alexsandro O, Dias Ulisses, Dias Zanoni
Institute of Computing, University of Campinas, 1251 Albert Einstein Ave., 13083-852, Campinas, Brazil.
School of Technology, University of Campinas, 1888 Paschoal Marmo St., 13484-332, Limeira, Brazil.
Algorithms Mol Biol. 2021 Dec 29;16(1):24. doi: 10.1186/s13015-021-00203-7.
In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome.
In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.
在比较基因组学领域,目标之一是估计能够将一个基因组转化为另一个基因组的遗传变化序列。基因组重排事件是能够改变基因组中遗传内容或元件排列的突变。反转和转座是研究最多的两种基因组重排事件。反转会使基因组的一段序列发生倒置,而转座会交换两个连续的片段。该领域的初步研究仅考虑基因的顺序。最近的研究在模型中纳入了其他遗传信息。特别是关于基因间区域大小的信息,基因间区域是线性基因组中每对基因之间以及两端的结构。
在这项工作中,我们研究了具有相同基因集的基因组上的基因间反转和转座排序问题,考虑了基因方向已知和未知的情况。此外,我们探索了该问题的一个变体,它推广了转座事件。结果,我们提出了一种近似算法,对于考虑反转和转座(经典定义)事件的两种情况,都保证了4倍的近似因子,相比之前已知的基因方向未知情况下的4.5倍近似有了改进。我们还通过纳入广义转座事件提出了一种3倍近似算法,并提出了一种贪心策略来提高算法的性能。我们采用模拟数据进行了实际测试,结果表明,与该问题的最知名算法相比,这两种情况下的算法往往表现更好。最后,我们使用真实基因组进行了实验,以证明算法的适用性。