Alexandrino Alexsandro Oliveira, Oliveira Andre Rodrigues, Dias Ulisses, Dias Zanoni
Institute of Computing, University of Campinas, Campinas, Brazil.
School of Technology, University of Campinas, Limeira, Brazil.
J Comput Biol. 2022 Mar;29(3):243-256. doi: 10.1089/cmb.2021.0279. Epub 2021 Nov 1.
In the comparative genomics field, one way to infer the evolutionary distance between two organisms of related species is by finding the minimum number of large-scale mutations, called genome rearrangements, that transform one genome into the other. This number is referred to as the . Since problems in this area emerged in the mid-1990s, several genome rearrangements have been proposed. Rearrangements that do not alter the genome content are called conservative, and in this group we have the following: the , which inverts a segment of the genome; the , which exchanges two consecutive segments; and the , which cuts two different pairs of adjacent blocks and joins them differently. Seminal works compared genomes sharing the same set of conserved blocks, but nowadays, researchers started looking at genomes with unequal gene content, by allowing the use of nonconservative rearrangements such as and (jointly called ). The transposition distance and the transposition and indel distance are both NP-hard. We investigate the transposition and indel distance and present a structure called labeled cycle graph, representing an instance of rearrangement distance problems for genomes with unequal gene content. This structure is used to devise a lower bound and a 2-approximation algorithm for the transposition and indel distance.
在比较基因组学领域,推断两个相关物种的生物体之间进化距离的一种方法是找到将一个基因组转化为另一个基因组所需的大规模突变的最小数量,这些大规模突变被称为基因组重排。这个数量被称为 。自20世纪90年代中期该领域出现问题以来,已经提出了几种基因组重排。不改变基因组内容的重排被称为保守重排,在这一类中我们有以下几种: ,它反转基因组的一段; ,它交换两个连续的片段;以及 ,它切割两对不同的相邻块并以不同方式连接它们。开创性的工作比较了共享相同保守块集的基因组,但如今,研究人员开始研究基因含量不等的基因组,方法是允许使用非保守重排,如 和 (统称为 )。转座距离以及转座和插入缺失距离都是NP难问题。我们研究转座和插入缺失距离,并提出一种称为标记循环图的结构,它代表了基因含量不等的基因组重排距离问题的一个实例。这种结构用于为转座和插入缺失距离设计一个下界和一个2近似算法。