Alexandrino Alexsandro Oliveira, Oliveira Andre Rodrigues, Jean Géraldine, Fertin Guillaume, Dias Ulisses, Dias Zanoni
Institute of Computing, University of Campinas, Campinas, Brazil.
Computing and Informatics Department, Mackenzie Presbyterian University, Barueri, Brazil.
J Comput Biol. 2023 Aug;30(8):861-876. doi: 10.1089/cmb.2023.0087. Epub 2023 May 24.
The most common way to calculate the rearrangement distance between two genomes is to use the size of a minimum length sequence of rearrangements that transforms one of the two given genomes into the other, where the genomes are represented as permutations using only their gene order, based on the assumption that genomes have the same gene content. With the advance of research in genome rearrangements, new works extended the classical models by either considering genomes with different gene content (unbalanced genomes) or including more genomic characteristics to the mathematical representation of the genomes, such as the distribution of intergenic regions sizes. In this study, we study the Reversal, Transposition, and Indel (Insertion and Deletion) Distance using intergenic information, which allows comparing unbalanced genomes, because indels are included in the rearrangement model (i.e., the set of possible rearrangements allowed when we compute the distance). For the particular case of transpositions and indels on unbalanced genomes, we present a 4-approximation algorithm, improving a previous 4.5 approximation. This algorithm is extended so as to deal with gene orientation and to maintain the 4-approximation factor for the Reversal, Transposition, and Indel Distance on unbalanced genomes. Furthermore, we evaluate the proposed algorithms using experiments on simulated data.
计算两个基因组之间重排距离最常见的方法是使用将两个给定基因组中的一个转化为另一个的重排的最小长度序列的大小,其中基因组基于基因组具有相同基因含量的假设,仅使用其基因顺序表示为排列。随着基因组重排研究的进展,新的工作通过考虑具有不同基因含量的基因组(不平衡基因组)或将更多基因组特征纳入基因组的数学表示(如基因间区域大小的分布)来扩展经典模型。在本研究中,我们使用基因间信息研究反转、转位和插入缺失(插入和缺失)距离,这允许比较不平衡基因组,因为插入缺失包含在重排模型中(即,我们计算距离时允许的可能重排集合)。对于不平衡基因组上转位和插入缺失的特殊情况,我们提出了一种4近似算法,改进了之前的4.5近似。该算法得到扩展,以处理基因方向并保持不平衡基因组上反转、转位和插入缺失距离的4近似因子。此外,我们使用模拟数据实验评估了所提出的算法。