• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

带环线和收费公路问题的近似和精确优化算法,具有重复、缺失、部分标记和不确定的测量值。

Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.

机构信息

Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA.

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.

出版信息

J Comput Biol. 2024 Oct;31(10):908-926. doi: 10.1089/cmb.2024.0661. Epub 2024 Oct 10.

DOI:10.1089/cmb.2024.0661
PMID:39387260
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11698667/
Abstract

The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.

摘要

收费公路问题旨在根据无序的两两距离重建一组一维点。收费公路在生物应用中很常见,如分子结构确定、基因组测序、串联质谱和分子纠错码。在距离的噪声观测下,收费公路问题是 NP 难问题,并且当使用传统算法时,可能需要指数时间和空间来解决。为了解决这个问题,我们通过优化的视角重新构建了有噪声的收费公路问题,试图同时找到未知的点集和一个与输入距离相似度最大化的排列。我们的核心贡献是一套能够稳健地解决这个新目标的算法。这包括一个双层优化框架,可以有效地解决多达 100000 个点的收费公路实例。我们表明,这个框架可以扩展到具有特定领域约束的场景,包括重复、缺失和部分标记的距离。使用这些,我们还将我们的算法扩展到分布在圆上的点(环城公路问题)。对于需要全局最优性的小规模应用,我们制定了一个整数线性规划(ILP),(i)接受来自通用凸函数族的目标,(ii)使用扩展形式来减少二进制变量的数量。在合成和真实部分消化数据上,我们的双层算法在具有挑战性的场景中实现了最先进的可扩展性,性能与竞争基线相匹配或超过。在小规模实例中,我们的 ILP 能够有效地恢复真实的分配,并产生与我们的交替算法相匹配或超过的重建。我们的实现可在 https://github.com/Kingsford-Group/turnpikesolvermm 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/0f13ef4f15cb/cmb.2024.0661_figure5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/a76a7837fceb/cmb.2024.0661_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c4390f4ddce6/cmb.2024.0661_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c030eff8f908/cmb.2024.0661_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c0295a5bceaf/cmb.2024.0661_figure4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/0f13ef4f15cb/cmb.2024.0661_figure5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/a76a7837fceb/cmb.2024.0661_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c4390f4ddce6/cmb.2024.0661_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c030eff8f908/cmb.2024.0661_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/c0295a5bceaf/cmb.2024.0661_figure4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/0f13ef4f15cb/cmb.2024.0661_figure5.jpg

相似文献

1
Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.带环线和收费公路问题的近似和精确优化算法,具有重复、缺失、部分标记和不确定的测量值。
J Comput Biol. 2024 Oct;31(10):908-926. doi: 10.1089/cmb.2024.0661. Epub 2024 Oct 10.
2
A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n Pairwise Distances in n Steps for the Sequencing Problem: II. Algorithm.用于测序问题的一种在\(n\)步内从\(n\)个两两距离的多重集中重建一组点的简单方法:II. 算法。
J Comput Biol. 2016 Dec;23(12):934-942. doi: 10.1089/cmb.2016.0046. Epub 2016 Jun 16.
3
A Simple Approach to the Reconstruction of a Set of Points from the Multiset of n(2) Pairwise Distances in n(2) Steps for the Sequencing Problem: I. Theory.一种在\(n^2\)步内从\(n^2\)个两两距离的多重集中重构一组点以解决测序问题的简单方法:I. 理论
J Comput Biol. 2016 Sep;23(9):769-75. doi: 10.1089/cmb.2016.0044. Epub 2016 Jun 16.
4
Reconstruction of sequence from its circular partial sums for cyclopeptide sequencing problem.用于环肽测序问题的从其循环部分和重建序列
J Bioinform Comput Biol. 2015 Feb;13(1):1540008. doi: 10.1142/S0219720015400089.
5
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
6
Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination.使用平方和松弛法整合NOE和RDC以确定蛋白质结构。
J Biomol NMR. 2017 Jul;68(3):163-185. doi: 10.1007/s10858-017-0108-7. Epub 2017 Jun 14.
7
A Simple Approach to the Reconstruction of a Set of Points from the Multiset of Pairwise Distances in n Steps for the Sequencing Problem: III. Noise Inputs for the Beltway Case.一种用于测序问题的在\(n\)步内从成对距离的多重集中重建点集的简单方法:III. 环路情况的噪声输入
J Comput Biol. 2019 Jan;26(1):68-75. doi: 10.1089/cmb.2018.0078. Epub 2018 Aug 17.
8
A Scalable Optimization Mechanism for Pairwise Based Discrete Hashing.一种用于基于成对的离散哈希的可扩展优化机制。
IEEE Trans Image Process. 2021;30:1130-1142. doi: 10.1109/TIP.2020.3040536. Epub 2020 Dec 15.
9
The median problems on linear multichromosomal genomes: graph representation and fast exact solutions.线性多染色体基因组上的中位数问题:图形表示与快速精确解
J Comput Biol. 2010 Sep;17(9):1195-211. doi: 10.1089/cmb.2010.0106.
10
Efficient Minimum Flow Decomposition via Integer Linear Programming.通过整数线性规划实现有效的最小流量分解。
J Comput Biol. 2022 Nov;29(11):1252-1267. doi: 10.1089/cmb.2022.0257. Epub 2022 Oct 18.

本文引用的文献

1
A Simple Approach to the Reconstruction of a Set of Points from the Multiset of Pairwise Distances in n Steps for the Sequencing Problem: III. Noise Inputs for the Beltway Case.一种用于测序问题的在\(n\)步内从成对距离的多重集中重建点集的简单方法:III. 环路情况的噪声输入
J Comput Biol. 2019 Jan;26(1):68-75. doi: 10.1089/cmb.2018.0078. Epub 2018 Aug 17.
2
A fast exact sequential algorithm for the partial digest problem.一种用于部分消化问题的快速精确序列算法。
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):510. doi: 10.1186/s12859-016-1365-2.
3
Reconstruction of sequence from its circular partial sums for cyclopeptide sequencing problem.
用于环肽测序问题的从其循环部分和重建序列
J Bioinform Comput Biol. 2015 Feb;13(1):1540008. doi: 10.1142/S0219720015400089.
4
Multiplex de novo sequencing of peptide antibiotics.肽类抗生素的多重从头测序
J Comput Biol. 2011 Nov;18(11):1371-81. doi: 10.1089/cmb.2011.0158. Epub 2011 Oct 28.
5
Simplified partial digest problem: enumerative and dynamic programming algorithms.简化的部分消化问题:枚举和动态规划算法
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):668-80. doi: 10.1109/TCBB.2007.1060.
6
An exponential example for a partial digest mapping algorithm.一个用于部分酶切图谱算法的指数示例。
J Comput Biol. 1994 Fall;1(3):235-9. doi: 10.1089/cmb.1994.1.235.
7
A partial digest approach to restriction site mapping.一种用于限制性酶切位点图谱绘制的部分酶切方法。
Proc Int Conf Intell Syst Mol Biol. 1993;1:362-70.
8
Physical mapping of chromosomes using unique probes.使用独特探针进行染色体的物理图谱绘制。
J Comput Biol. 1995 Summer;2(2):159-84. doi: 10.1089/cmb.1995.2.159.
9
A simple method for DNA restriction site mapping.一种用于DNA限制性酶切位点图谱绘制的简单方法。
Nucleic Acids Res. 1976 Sep;3(9):2387-98. doi: 10.1093/nar/3.9.2387.