Institute of Computing, University of Campinas (Unicamp), Campinas, São Paulo 13083-852, Brazil.
Center for Mathematics, Computation and Cognition, Federal University of ABC (UFABC), Santo André, São Paulo 09210-580, Brazil.
J Bioinform Comput Biol. 2020 Apr;18(2):2050006. doi: 10.1142/S0219720020500067. Epub 2020 Apr 24.
One of the main problems in Computational Biology is to find the evolutionary distance among species. In most approaches, such distance only involves rearrangements, which are mutations that alter large pieces of the species' genome. When we represent genomes as permutations, the problem of transforming one genome into another is equivalent to the problem of Sorting Permutations by Rearrangement Operations. The traditional approach is to consider that any rearrangement has the same probability to happen, and so, the goal is to find a minimum sequence of operations which sorts the permutation. However, studies have shown that some rearrangements are more likely to happen than others, and so a weighted approach is more realistic. In a weighted approach, the goal is to find a sequence which sorts the permutations, such that the cost of that sequence is minimum. This work introduces a new type of cost function, which is related to the amount of fragmentation caused by a rearrangement. We present some results about the lower and upper bounds for the fragmentation-weighted problems and the relation between the unweighted and the fragmentation-weighted approach. Our main results are 2-approximation algorithms for five versions of this problem involving reversals and transpositions. We also give bounds for the diameters concerning these problems and provide an improved approximation factor for simple permutations considering transpositions.
计算生物学中的一个主要问题是发现物种之间的进化距离。在大多数方法中,这种距离仅涉及重排,即改变物种基因组大片段的突变。当我们将基因组表示为排列时,将一个基因组转换为另一个基因组的问题等同于通过重排操作对排列进行排序的问题。传统的方法是假设任何重排都有相同的发生概率,因此,目标是找到一个最小的操作序列来对排列进行排序。然而,研究表明,一些重排比其他重排更有可能发生,因此加权方法更现实。在加权方法中,目标是找到一个排序排列的序列,使得该序列的成本最小。这项工作引入了一种新的成本函数类型,它与重排引起的碎片化程度有关。我们给出了有关碎片化加权问题的下界和上界以及无权重和碎片化加权方法之间关系的一些结果。我们的主要结果是针对涉及反转和转位的这个问题的五个版本的 2-近似算法。我们还给出了这些问题的直径的界,并考虑了转位对简单排列的改进逼近因子。