Suppr超能文献

一种用于DCJ距离的精确且快速的SAT公式化方法。

An Exact and Fast SAT Formulation for the DCJ Distance.

作者信息

Sarnaik Aaryan M, Chen Ke, Diaz Austin, Shao Mingfu

机构信息

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA.

Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA.

出版信息

bioRxiv. 2024 Nov 8:2024.11.05.622153. doi: 10.1101/2024.11.05.622153.

Abstract

Reducing into a satisfiability (SAT) formulation has been proven effective in solving certain NP-hard problems. In this work, we extend this research by presenting a novel SAT formulation for computing the double-cut-and-join (DCJ) distance between two genomes with duplicate genes. The DCJ distance serves as a crucial metric in studying genome rearrangement. Previous work achieved an exact solution by transforming it into a maximum cycle decomposition (MCD) problem on the corresponding adjacency graph of two genomes, followed by reducing this problem into an integer linear programming (ILP) formulation. Using both simulated datasets and real genomic datasets, we firmly conclude that the SAT-based method scales much better and runs faster than using ILP, being able to solve a whole new range of instances which the ILP-based method cannot solve in a reasonable amount of time. This underscores the SAT-based approach as a versatile and more powerful alternative to ILP, with promising implications for broader applications in computational biology.

摘要

事实证明,将问题规约为可满足性(SAT)公式在解决某些NP难问题方面是有效的。在这项工作中,我们通过提出一种新颖的SAT公式来扩展这项研究,该公式用于计算具有重复基因的两个基因组之间的双切割连接(DCJ)距离。DCJ距离是研究基因组重排的关键指标。先前的工作通过将其转化为两个基因组相应邻接图上的最大循环分解(MCD)问题,然后将该问题规约为整数线性规划(ILP)公式,从而得到了精确解。使用模拟数据集和真实基因组数据集,我们坚定地得出结论:基于SAT的方法比使用ILP的方法扩展性更好、运行速度更快,能够解决一系列基于ILP的方法在合理时间内无法解决的全新实例。这突出了基于SAT的方法作为ILP的一种通用且更强大的替代方法,对计算生物学中的更广泛应用具有潜在的意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验