Martinez Fábio V, Feijão Pedro, Braga Marília Dv, Stoye Jens
Faculdade de Computação, Universidade Federal de Mato Grosso do Sul, Avenida Costa e Silva, s-n, Campo Grande, 79070-900 MS Brazil ; Technische Fakultät and CeBiTec, Universität Bielefeld, Universitätsstr. 25, Bielefeld, 33615 Germany.
Technische Fakultät and CeBiTec, Universität Bielefeld, Universitätsstr. 25, Bielefeld, 33615 Germany.
Algorithms Mol Biol. 2015 Apr 1;10:13. doi: 10.1186/s13015-015-0041-9. eCollection 2015.
Structural variation in genomes can be revealed by many (dis)similarity measures. Rearrangement operations, such as the so called double-cut-and-join (DCJ), are large-scale mutations that can create complex changes and produce such variations in genomes. A basic task in comparative genomics is to find the rearrangement distance between two given genomes, i.e., the minimum number of rearragement operations that transform one given genome into another one. In a family-based setting, genes are grouped into gene families and efficient algorithms have already been presented to compute the DCJ distance between two given genomes. In this work we propose the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarities between genes. We prove that this new family-free DCJ distance problem is APX-hard and provide an integer linear program to its solution. We also study a family-free DCJ similarity and prove that its computation is NP-hard.
基因组中的结构变异可以通过许多(不)相似性度量来揭示。重排操作,比如所谓的双切割与连接(DCJ),是大规模突变,能够产生复杂变化并在基因组中产生此类变异。比较基因组学中的一个基本任务是找到两个给定基因组之间的重排距离,即把一个给定基因组转化为另一个所需的最少重排操作数。在基于家族的背景下,基因被分组到基因家族中,并且已经提出了高效算法来计算两个给定基因组之间的DCJ距离。在这项工作中,我们提出了在没有预先进行基因家族分配的情况下,直接利用基因之间的成对相似性来计算两个给定基因组的DCJ距离的问题。我们证明这个新的无家族DCJ距离问题是APX难的,并提供了一个整数线性规划来求解它。我们还研究了无家族DCJ相似性,并证明其计算是NP难的。