Suppr超能文献

带环线和收费公路问题的近似和精确优化算法,具有重复、缺失、部分标记和不确定的测量值。

Approximate and Exact Optimization Algorithms for the Beltway and Turnpike Problems with Duplicated, Missing, Partially Labeled, and Uncertain Measurements.

机构信息

Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh Pennsylvania, USA.

Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.

出版信息

J Comput Biol. 2024 Oct;31(10):908-926. doi: 10.1089/cmb.2024.0661. Epub 2024 Oct 10.

Abstract

The Turnpike problem aims to reconstruct a set of one-dimensional points from their unordered pairwise distances. Turnpike arises in biological applications such as molecular structure determination, genomic sequencing, tandem mass spectrometry, and molecular error-correcting codes. Under noisy observation of the distances, the Turnpike problem is NP-hard and can take exponential time and space to solve when using traditional algorithms. To address this, we reframe the noisy Turnpike problem through the lens of optimization, seeking to simultaneously find the unknown point set and a permutation that maximizes similarity to the input distances. Our core contribution is a suite of algorithms that robustly solve this new objective. This includes a bilevel optimization framework that can efficiently solve Turnpike instances with up to 100,000 points. We show that this framework can be extended to scenarios with domain-specific constraints that include duplicated, missing, and partially labeled distances. Using these, we also extend our algorithms to work for points distributed on a circle (the Beltway problem). For small-scale applications that require global optimality, we formulate an integer linear program (ILP) that (i) accepts an objective from a generic family of convex functions and (ii) uses an extended formulation to reduce the number of binary variables. On synthetic and real partial digest data, our bilevel algorithms achieved state-of-the-art scalability across challenging scenarios with performance that matches or exceeds competing baselines. On small-scale instances, our ILP efficiently recovered ground-truth assignments and produced reconstructions that match or exceed our alternating algorithms. Our implementations are available at https://github.com/Kingsford-Group/turnpikesolvermm.

摘要

收费公路问题旨在根据无序的两两距离重建一组一维点。收费公路在生物应用中很常见,如分子结构确定、基因组测序、串联质谱和分子纠错码。在距离的噪声观测下,收费公路问题是 NP 难问题,并且当使用传统算法时,可能需要指数时间和空间来解决。为了解决这个问题,我们通过优化的视角重新构建了有噪声的收费公路问题,试图同时找到未知的点集和一个与输入距离相似度最大化的排列。我们的核心贡献是一套能够稳健地解决这个新目标的算法。这包括一个双层优化框架,可以有效地解决多达 100000 个点的收费公路实例。我们表明,这个框架可以扩展到具有特定领域约束的场景,包括重复、缺失和部分标记的距离。使用这些,我们还将我们的算法扩展到分布在圆上的点(环城公路问题)。对于需要全局最优性的小规模应用,我们制定了一个整数线性规划(ILP),(i)接受来自通用凸函数族的目标,(ii)使用扩展形式来减少二进制变量的数量。在合成和真实部分消化数据上,我们的双层算法在具有挑战性的场景中实现了最先进的可扩展性,性能与竞争基线相匹配或超过。在小规模实例中,我们的 ILP 能够有效地恢复真实的分配,并产生与我们的交替算法相匹配或超过的重建。我们的实现可在 https://github.com/Kingsford-Group/turnpikesolvermm 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/583e/11698667/a76a7837fceb/cmb.2024.0661_figure1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验