Cunha Luís, Sau Ignasi, Souza Uéverton
Instituto de Computação, Universidade Federal Fluminense, Niterói, Brazil.
IMPA, Instituto de Matemática Pura e Aplicada, Rio de Janeiro, Brazil.
Algorithms Mol Biol. 2024 Dec 24;19(1):24. doi: 10.1186/s13015-024-00269-z.
Genome rearrangements are events where large blocks of DNA exchange places during evolution. The analysis of these events is a promising tool for understanding evolutionary genomics, providing data for phylogenetic reconstruction based on genome rearrangement measures. Many pairwise rearrangement distances have been proposed, based on finding the minimum number of rearrangement events to transform one genome into the other, using some predefined operation. When more than two genomes are considered, we have the more challenging problem of rearrangement-based phylogeny reconstruction. Given a set of genomes and a distance notion, there are at least two natural ways to define the "target" genome. On the one hand, finding a genome that minimizes the sum of the distances from this to any other, called the median genome. On the other hand, finding a genome that minimizes the maximum distance to any other, called the closest genome. Considering genomes as permutations of distinct integers, some distance metrics have been extensively studied. We investigate the median and closest problems on permutations over the following metrics: breakpoint distance, swap distance, block-interchange distance, short-block-move distance, and transposition distance. In biological applications some values are usually very small, such as the solution value d or the number k of input permutations. For each of these metrics and parameters d or k, we analyze the closest and the median problems from the viewpoint of parameterized complexity. We obtain the following results: NP-hardness for finding the median/closest permutation regarding some metrics of distance, even for only permutations; Polynomial kernels for the problems of finding the median permutation of all studied metrics, considering the target distance d as parameter; NP-hardness result for finding the closest permutation by short-block-moves; FPT algorithms and infeasibility of polynomial kernels for finding the closest permutation for some metrics when parameterized by the target distance d.
基因组重排是指在进化过程中大片段DNA交换位置的事件。对这些事件的分析是理解进化基因组学的一个有前景的工具,为基于基因组重排度量的系统发育重建提供数据。基于使用一些预定义操作找到将一个基因组转化为另一个基因组所需的最小重排事件数,已经提出了许多成对重排距离。当考虑两个以上的基因组时,我们面临基于重排的系统发育重建这一更具挑战性的问题。给定一组基因组和一个距离概念,至少有两种自然的方法来定义“目标”基因组。一方面,找到一个基因组,使其到任何其他基因组的距离之和最小,称为中位数基因组。另一方面,找到一个基因组,使其到任何其他基因组的最大距离最小,称为最接近基因组。将基因组视为不同整数的排列,一些距离度量已经得到了广泛研究。我们研究了以下度量下排列的中位数和最接近问题:断点距离、交换距离、块交换距离、短块移动距离和转座距离。在生物学应用中,一些值通常非常小,例如解值d或输入排列的数量k。对于这些度量以及参数d或k中的每一个,我们从参数化复杂性的角度分析最接近和中位数问题。我们得到以下结果:对于某些距离度量,找到中位数/最接近排列是NP难的,即使对于只有 个排列;考虑目标距离d作为参数,对于所有研究度量的中位数排列问题有多项式核;通过短块移动找到最接近排列的NP难结果;当以目标距离d为参数时,对于某些度量找到最接近排列的FPT算法和多项式核的不可行性。