逼近单边典范邻接数问题。

Approaching the One-Sided Exemplar Adjacency Number Problem.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.

DOI:10.1109/TCBB.2019.2913834

Abstract

The one-sided Exemplar Adjacency Number (EAN) is a known problem for computing the exemplar similarity between a generic linear genome G with gene duplications and an exemplar genome H (over the same set of n gene families). In this problem, we need to compute an exemplar genome G, which is a permutation obtained from G, such that the number of common adjacencies between G and H is maximized. Unfortunately, the problem is not only NP-hard but also NP-hard to approximate. In this paper, we approach the problem by relaxing the constraint such that a sub-permutation G obtained from G does not have to include all the gene families, but still needs to have a length at least k. Hence G is called a pseudo-exemplar genome. Then, a slightly more general problem (One-sided EAN+) is defined: compute a pseudo-exemplar genome G from G such that the number of common adjacencies between H and G is maximized. Certainly One-sided EAN+ contains One-sided EAN as a special case; moreover, it presents some flexibility in designing algorithms. First, we relax and formulate the One-sided EAN+ problem as the maximum independent set (MIS) on a colored interval graph and hence reduce the appearance of each gene to at most two times. We show that this new relaxation is still NP-complete, though a simple factor-2 approximation algorithm can be designed; moreover, we also prove that the problem cannot be approximated within 2-ε by a local search technique. We then show that this relaxed version is fixed-parameter tractable (FPT). Second, to ensure that each gene appears in G at most once, we use integer linear programming (ILP) to solve this problem. Finally, we implement our algorithm and compare it with the up-to-date software GREDU, with simulated signed and unsigned genomes. It turns out that our algorithm is more stable and can process genomes of length up to 12,000 for signed genomes (while GREDU can falter on such a large signed genome and it cannot handle unsigned genomes at all).

摘要

单边范例邻接数（EAN）是计算具有基因重复的通用线性基因组 G 与范例基因组 H 之间范例相似度的已知问题（在相同的 n 个基因家族集合上）。在这个问题中，我们需要计算一个范例基因组 G，它是从 G 中获得的排列，使得 G 和 H 之间的公共邻接数最大化。不幸的是，该问题不仅是 NP 难的，而且难以近似。在本文中，我们通过放宽约束来解决这个问题，使得从 G 获得的子排列 G 不一定包含所有的基因家族，但仍需要至少有 k 个长度。因此，G 被称为伪范例基因组。然后，定义了一个稍微更一般的问题（单边 EAN+）：从 G 中计算一个伪范例基因组 G，使得 H 和 G 之间的公共邻接数最大化。当然，单边 EAN+ 是单边 EAN 的一个特例；此外，它在设计算法方面具有一定的灵活性。首先，我们将单边 EAN+ 问题放松并形式化为彩色区间图上的最大独立集（MIS），从而将每个基因的出现次数减少到最多两次。我们表明，尽管可以设计一个简单的因子 2 近似算法，但这种新的放松仍然是 NP 完全的；此外，我们还证明了该问题不能通过局部搜索技术在 2-ε 内逼近。然后，我们证明这个放松的版本是固定参数可处理的（FPT）。其次，为了确保每个基因在 G 中只出现一次，我们使用整数线性规划（ILP）来解决这个问题。最后，我们实现了我们的算法，并与最新的软件 GREDU 进行了比较，使用了模拟的有符号和无符号基因组。结果表明，我们的算法更稳定，可以处理长达 12000 个长度的有符号基因组（而 GREDU 在如此大的有符号基因组上可能会失败，并且根本无法处理无符号基因组）。

相似文献

Approaching the One-Sided Exemplar Adjacency Number Problem.逼近单边典范邻接数问题。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.

A Fast and Exact Algorithm for the Exemplar Breakpoint Distance.一种用于范例断点距离的快速精确算法。

J Comput Biol. 2016 May;23(5):337-46. doi: 10.1089/cmb.2015.0193. Epub 2016 Mar 8.

The zero exemplar distance problem.零范例距离问题。

J Comput Biol. 2011 Sep;18(9):1077-86. doi: 10.1089/cmb.2011.0097.

Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes.用于计算具有重复基因的两个基因组之间断点数量和邻接数量的高效工具。

J Comput Biol. 2008 Oct;15(8):1093-115. doi: 10.1089/cmb.2008.0061.

An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance.基于范例断点距离的支架整数线性规划方法。

J Comput Biol. 2022 Sep;29(9):961-973. doi: 10.1089/cmb.2021.0399. Epub 2022 May 19.

A Dynamic Programming Algorithm For (1,2)-Exemplar Breakpoint Distance.一种用于（1,2）-示例断点距离的动态规划算法。

J Comput Biol. 2015 Jul;22(7):666-76. doi: 10.1089/cmb.2014.0200. Epub 2015 Jun 3.

Algorithms for sorting unsigned linear genomes by the DCJ operations.基于 DCJ 操作的无符号线性基因组排序算法。

Bioinformatics. 2011 Feb 1;27(3):311-6. doi: 10.1093/bioinformatics/btq674. Epub 2010 Dec 6.

Comparing genomes with duplications: a computational complexity point of view.从计算复杂性角度比较带有重复序列的基因组。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):523-34. doi: 10.1109/TCBB.2007.1069.

An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.一种用于计算具有重复基因的基因组的双切割连接距离的精确算法。

J Comput Biol. 2015 May;22(5):425-35. doi: 10.1089/cmb.2014.0096. Epub 2014 Dec 17.

An improved approximation algorithm for scaffold filling to maximize the common adjacencies.一种改进的支架填充近似算法，以最大化公共邻接。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):905-13. doi: 10.1109/TCBB.2013.100.

逼近单边典范邻接数问题。

Approaching the One-Sided Exemplar Adjacency Number Problem.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.

DOI:10.1109/TCBB.2019.2913834

PMID:31056506

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

逼近单边典范邻接数问题。

Approaching the One-Sided Exemplar Adjacency Number Problem.

出版信息

相似文献

逼近单边典范邻接数问题。

Approaching the One-Sided Exemplar Adjacency Number Problem.

出版信息

相似文献