Suppr超能文献

重新审视图遍历编辑距离及其变体的复杂性和算法。

Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.

作者信息

Qiu Yutong, Shen Yihang, Kingsford Carl

机构信息

Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213, PA, USA.

出版信息

Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6.

Abstract

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/ .

摘要

图遍历编辑距离(GTED)由埃布拉欣普尔·博罗杰尼等人于2018年提出,是一种精妙的距离度量,定义为从两个边标记图中的欧拉路径重建的字符串之间的最小编辑距离。GTED可通过直接比较德布鲁因图来推断物种之间的进化关系,而无需进行基因组组装这种计算成本高昂且容易出错的过程。埃布拉欣普尔·博罗杰尼等人(2018年)为GTED提出了两种整数线性规划(ILP)公式,并声称GTED是多项式可解的,因为其中一个ILP的线性规划松弛总是能产生最优整数解。GTED是多项式可解的这一说法与现有字符串到图匹配问题的复杂性结果相矛盾。我们通过证明GTED是NP完全问题,并表明埃布拉欣普尔·博罗杰尼等人提出的ILP并没有解决GTED,而是解决了GTED的一个下界,且不能在多项式时间内求解,从而解决了复杂性结果中的这一冲突。此外,我们提供了前两个正确的GTED的ILP公式,并评估了它们的经验效率。这些结果为比较基因组图提供了坚实的算法基础,并指出了启发式方法的方向。用于重现实验结果的源代码可在https://github.com/Kingsford-Group/gtednewilp/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8543/11057191/d4114ba423b3/13015_2024_262_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验