Infrared：一个由声明式树分解驱动的生物信息学框架。

Infrared: a declarative tree decomposition-powered framework for bioinformatics.

作者信息

Yao Hua-Ting, Marchand Bertrand, Berkemer Sarah J, Ponty Yann, Will Sebastian

机构信息

LIX, CNRS UMR 7161, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.

Department of Theoretical Chemistry, University of Vienna, Vienna, Austria.

出版信息

Algorithms Mol Biol. 2024 Mar 16;19(1):13. doi: 10.1186/s13015-024-00258-2.

DOI:10.1186/s13015-024-00258-2

PMID:38493130

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10943887/

Abstract

MOTIVATION

Many bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using Dynamic Programming (DP). However, such exact methods are typically tailored towards specific settings, complex to develop, and hard to implement and adapt to problem variations.

METHODS

We introduce the Infrared framework to overcome such hindrances for a large class of problems. Its underlying paradigm is tailored toward problems that can be declaratively formalized as sparse feature networks, a generalization of constraint networks. Classic Boolean constraints specify a search space, consisting of putative solutions whose evaluation is performed through a combination of features. Problems are then solved using generic cluster tree elimination algorithms over a tree decomposition of the feature network. Their overall complexities are linear on the number of variables, and only exponential in the treewidth of the feature network. For sparse feature networks, associated with low to moderate treewidths, these algorithms allow to find optimal solutions, or generate controlled samples, with practical empirical efficiency.

RESULTS

Implementing these methods, the Infrared software allows Python programmers to rapidly develop exact optimization and sampling applications based on a tree decomposition-based efficient processing. Instead of directly coding specialized algorithms, problems are declaratively modeled as sets of variables over finite domains, whose dependencies are captured by constraints and functions. Such models are then automatically solved by generic DP algorithms. To illustrate the applicability of Infrared in bioinformatics and guide new users, we model and discuss variants of bioinformatics applications. We provide reimplementations and extensions of methods for RNA design, RNA sequence-structure alignment, parsimony-driven inference of ancestral traits in phylogenetic trees/networks, and design of coding sequences. Moreover, we demonstrate multidimensional Boltzmann sampling. These applications of the framework-together with our novel results-underline the practical relevance of Infrared. Remarkably, the achieved complexities are typically equivalent to the ones of specialized algorithms and implementations.

AVAILABILITY

Infrared is available at https://amibio.gitlabpages.inria.fr/Infrared with extensive documentation, including various usage examples and API reference; it can be installed using Conda or from source.

摘要

动机

许多生物信息学问题都可以作为优化或受控采样任务来处理，并使用动态规划（DP）精确且高效地解决。然而，这种精确方法通常是针对特定设置量身定制的，开发复杂，难以实现且难以适应问题的变化。

方法

我们引入了Infrared框架来克服一大类问题的此类障碍。其底层范式是针对可以声明式形式化为稀疏特征网络（约束网络的一种推广）的问题量身定制的。经典布尔约束指定一个搜索空间，该空间由通过特征组合进行评估的假定解决方案组成。然后，使用基于特征网络树分解的通用聚类树消除算法来解决问题。它们的总体复杂度与变量数量呈线性关系，并且仅在特征网络的树宽上呈指数关系。对于与低到中等树宽相关的稀疏特征网络，这些算法能够以实际的经验效率找到最优解或生成受控样本。

结果

通过实现这些方法，Infrared软件使Python程序员能够基于基于树分解的高效处理快速开发精确的优化和采样应用程序。问题不是直接编码专门的算法，而是声明式地建模为有限域上的变量集，其依赖关系由约束和函数捕获。然后，这些模型由通用DP算法自动求解。为了说明Infrared在生物信息学中的适用性并指导新用户，我们对生物信息学应用程序的变体进行建模和讨论。我们提供了RNA设计、RNA序列-结构比对、系统发育树/网络中祖先性状的简约驱动推断以及编码序列设计等方法的重新实现和扩展。此外，我们展示了多维玻尔兹曼采样。该框架的这些应用以及我们的新成果强调了Infrared的实际相关性。值得注意的是，所实现的复杂度通常与专门算法和实现的复杂度相当。