• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

逼近单边典范邻接数问题。

Approaching the One-Sided Exemplar Adjacency Number Problem.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.

DOI:10.1109/TCBB.2019.2913834
PMID:31056506
Abstract

The one-sided Exemplar Adjacency Number (EAN) is a known problem for computing the exemplar similarity between a generic linear genome G with gene duplications and an exemplar genome H (over the same set of n gene families). In this problem, we need to compute an exemplar genome G, which is a permutation obtained from G, such that the number of common adjacencies between G and H is maximized. Unfortunately, the problem is not only NP-hard but also NP-hard to approximate. In this paper, we approach the problem by relaxing the constraint such that a sub-permutation G obtained from G does not have to include all the gene families, but still needs to have a length at least k. Hence G is called a pseudo-exemplar genome. Then, a slightly more general problem (One-sided EAN+) is defined: compute a pseudo-exemplar genome G from G such that the number of common adjacencies between H and G is maximized. Certainly One-sided EAN+ contains One-sided EAN as a special case; moreover, it presents some flexibility in designing algorithms. First, we relax and formulate the One-sided EAN+ problem as the maximum independent set (MIS) on a colored interval graph and hence reduce the appearance of each gene to at most two times. We show that this new relaxation is still NP-complete, though a simple factor-2 approximation algorithm can be designed; moreover, we also prove that the problem cannot be approximated within 2-ε by a local search technique. We then show that this relaxed version is fixed-parameter tractable (FPT). Second, to ensure that each gene appears in G at most once, we use integer linear programming (ILP) to solve this problem. Finally, we implement our algorithm and compare it with the up-to-date software GREDU, with simulated signed and unsigned genomes. It turns out that our algorithm is more stable and can process genomes of length up to 12,000 for signed genomes (while GREDU can falter on such a large signed genome and it cannot handle unsigned genomes at all).

摘要

单边范例邻接数(EAN)是计算具有基因重复的通用线性基因组 G 与范例基因组 H 之间范例相似度的已知问题(在相同的 n 个基因家族集合上)。在这个问题中,我们需要计算一个范例基因组 G,它是从 G 中获得的排列,使得 G 和 H 之间的公共邻接数最大化。不幸的是,该问题不仅是 NP 难的,而且难以近似。在本文中,我们通过放宽约束来解决这个问题,使得从 G 获得的子排列 G 不一定包含所有的基因家族,但仍需要至少有 k 个长度。因此,G 被称为伪范例基因组。然后,定义了一个稍微更一般的问题(单边 EAN+):从 G 中计算一个伪范例基因组 G,使得 H 和 G 之间的公共邻接数最大化。当然,单边 EAN+ 是单边 EAN 的一个特例;此外,它在设计算法方面具有一定的灵活性。首先,我们将单边 EAN+ 问题放松并形式化为彩色区间图上的最大独立集(MIS),从而将每个基因的出现次数减少到最多两次。我们表明,尽管可以设计一个简单的因子 2 近似算法,但这种新的放松仍然是 NP 完全的;此外,我们还证明了该问题不能通过局部搜索技术在 2-ε 内逼近。然后,我们证明这个放松的版本是固定参数可处理的(FPT)。其次,为了确保每个基因在 G 中只出现一次,我们使用整数线性规划(ILP)来解决这个问题。最后,我们实现了我们的算法,并与最新的软件 GREDU 进行了比较,使用了模拟的有符号和无符号基因组。结果表明,我们的算法更稳定,可以处理长达 12000 个长度的有符号基因组(而 GREDU 在如此大的有符号基因组上可能会失败,并且根本无法处理无符号基因组)。

相似文献

1
Approaching the One-Sided Exemplar Adjacency Number Problem.逼近单边典范邻接数问题。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1946-1954. doi: 10.1109/TCBB.2019.2913834. Epub 2020 Dec 8.
2
A Fast and Exact Algorithm for the Exemplar Breakpoint Distance.一种用于范例断点距离的快速精确算法。
J Comput Biol. 2016 May;23(5):337-46. doi: 10.1089/cmb.2015.0193. Epub 2016 Mar 8.
3
The zero exemplar distance problem.零范例距离问题。
J Comput Biol. 2011 Sep;18(9):1077-86. doi: 10.1089/cmb.2011.0097.
4
Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes.用于计算具有重复基因的两个基因组之间断点数量和邻接数量的高效工具。
J Comput Biol. 2008 Oct;15(8):1093-115. doi: 10.1089/cmb.2008.0061.
5
An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance.基于范例断点距离的支架整数线性规划方法。
J Comput Biol. 2022 Sep;29(9):961-973. doi: 10.1089/cmb.2021.0399. Epub 2022 May 19.
6
A Dynamic Programming Algorithm For (1,2)-Exemplar Breakpoint Distance.一种用于(1,2)-示例断点距离的动态规划算法。
J Comput Biol. 2015 Jul;22(7):666-76. doi: 10.1089/cmb.2014.0200. Epub 2015 Jun 3.
7
Algorithms for sorting unsigned linear genomes by the DCJ operations.基于 DCJ 操作的无符号线性基因组排序算法。
Bioinformatics. 2011 Feb 1;27(3):311-6. doi: 10.1093/bioinformatics/btq674. Epub 2010 Dec 6.
8
Comparing genomes with duplications: a computational complexity point of view.从计算复杂性角度比较带有重复序列的基因组。
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):523-34. doi: 10.1109/TCBB.2007.1069.
9
An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes.一种用于计算具有重复基因的基因组的双切割连接距离的精确算法。
J Comput Biol. 2015 May;22(5):425-35. doi: 10.1089/cmb.2014.0096. Epub 2014 Dec 17.
10
An improved approximation algorithm for scaffold filling to maximize the common adjacencies.一种改进的支架填充近似算法,以最大化公共邻接。
IEEE/ACM Trans Comput Biol Bioinform. 2013 Jul-Aug;10(4):905-13. doi: 10.1109/TCBB.2013.100.