Institute of Computing, University of Campinas, 1251 Albert Einstein Ave., 13083-852 Campinas, São Paulo, Brazil.
School of Technology, University of Campinas, 1888 Paschoal Marmo St., 13484-332 Limeira, São Paulo, Brazil.
J Bioinform Comput Biol. 2021 Dec;19(6):2140011. doi: 10.1142/S0219720021400114. Epub 2021 Nov 13.
Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.
给定两个基因组 [Formula: see text] 和 [Formula: see text],找出在进化过程中可能发生的最小基因组重排数量。这个广泛的定义至少缺乏两个重要的考虑因素:第一个是从基因组中提取哪些特征来创建有用的数学模型,第二个是应该表示哪些类型的基因组重排事件。关于第一个考虑因素,基因组重排领域的开创性工作仅使用基因顺序将基因组表示为整数的排列,忽略了许多重要方面,如基因复制、基因间区域和基因之间的复杂相互作用。关于第二个考虑因素,一些重排事件被广泛研究,如倒位和转座。在本文中,我们首先考虑了第一个因素,并创建了一个考虑基因顺序和基因间区域核苷酸数量的模型。此外,我们还考虑了倒位、转座和插入缺失(插入和缺失)基因组物质的事件。我们提出了一种用于倒位和插入缺失的 4-近似算法,一种用于转位和插入缺失的 [Formula: see text]-近似算法,以及一种用于倒位、转位和插入缺失的 6-近似算法。