ACE：用于最大熵图形模型推断的自适应聚类扩展。

ACE: adaptive cluster expansion for maximum entropy graphical model inference.

机构信息

Departments of Chemical Engineering and Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139, USA.

Laboratoire de Physique Statistique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Université, Paris, France.

出版信息

Bioinformatics. 2016 Oct 15;32(20):3089-3097. doi: 10.1093/bioinformatics/btw328. Epub 2016 Jun 21.

DOI:10.1093/bioinformatics/btw328

PMID:27329863

Abstract

MOTIVATION

Graphical models are often employed to interpret patterns of correlations observed in data through a network of interactions between the variables. Recently, Ising/Potts models, also known as Markov random fields, have been productively applied to diverse problems in biology, including the prediction of structural contacts from protein sequence data and the description of neural activity patterns. However, inference of such models is a challenging computational problem that cannot be solved exactly. Here, we describe the adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data. ACE avoids overfitting by constructing a sparse network of interactions sufficient to reproduce the observed correlation data within the statistical error expected due to finite sampling. When convergence of the ACE algorithm is slow, we combine it with a Boltzmann Machine Learning algorithm (BML). We illustrate this method on a variety of biological and artificial datasets and compare it to state-of-the-art approximate methods such as Gaussian and pseudo-likelihood inference.

RESULTS

We show that ACE accurately reproduces the true parameters of the underlying model when they are known, and yields accurate statistical descriptions of both biological and artificial data. Models inferred by ACE more accurately describe the statistics of the data, including both the constrained low-order correlations and unconstrained higher-order correlations, compared to those obtained by faster Gaussian and pseudo-likelihood methods. These alternative approaches can recover the structure of the interaction network but typically not the correct strength of interactions, resulting in less accurate generative models.

AVAILABILITY AND IMPLEMENTATION

The ACE source code, user manual and tutorials with the example data and filtered correlations described herein are freely available on GitHub at https://github.com/johnbarton/ACE CONTACTS: jpbarton@mit.edu, cocco@lps.ens.frSupplementary information: Supplementary data are available at Bioinformatics online.

摘要

动机

图形模型常用于通过变量之间的相互作用网络来解释数据中观察到的相关模式。最近，伊辛/玻尔兹曼模型（也称为马尔可夫随机场）已成功应用于生物学中的各种问题，包括从蛋白质序列数据预测结构接触和描述神经活动模式。然而，这种模型的推断是一个具有挑战性的计算问题，无法精确求解。在这里，我们描述了自适应聚类扩展（ACE）方法，以基于相关数据快速准确地推断伊辛或玻尔兹曼模型。ACE 通过构建一个足够稀疏的相互作用网络来避免过度拟合，该网络足以在由于有限采样而导致的统计误差范围内再现观察到的相关数据。当 ACE 算法的收敛速度较慢时，我们将其与玻尔兹曼机器学习算法（BML）相结合。我们在各种生物和人工数据集上对此方法进行了说明，并将其与最新的近似方法（如高斯和伪似然推断）进行了比较。

结果

当已知真实模型的参数时，ACE 准确地再现了真实模型的参数，并对生物和人工数据都进行了准确的统计描述。与更快的高斯和伪似然方法相比，ACE 推断出的模型更准确地描述了数据的统计信息，包括受约束的低阶相关和不受约束的高阶相关。这些替代方法可以恢复相互作用网络的结构，但通常不能恢复相互作用的正确强度，从而导致生成模型不够准确。

可用性和实现

ACE 的源代码、用户手册和教程以及本文所述的示例数据和过滤相关信息均可在 GitHub 上免费获得，网址为 https://github.com/johnbarton/ACE。

联系方式

jpbarton@mit.edu，cocco@lps.ens.fr

补充信息

补充资料可在《生物信息学》在线获取。

相似文献

ACE: adaptive cluster expansion for maximum entropy graphical model inference.ACE：用于最大熵图形模型推断的自适应聚类扩展。

Bioinformatics. 2016 Oct 15;32(20):3089-3097. doi: 10.1093/bioinformatics/btw328. Epub 2016 Jun 21.

MPF-BML: a standalone GUI-based package for maximum entropy model inference.MPF-BML：一个独立的基于图形用户界面的最大熵模型推理软件包。

Bioinformatics. 2020 Apr 1;36(7):2278-2279. doi: 10.1093/bioinformatics/btz925.

3off2: A network reconstruction algorithm based on 2-point and 3-point information statistics.3off2：一种基于两点和三点信息统计的网络重建算法。

BMC Bioinformatics. 2016 Jan 20;17 Suppl 2(Suppl 2):12. doi: 10.1186/s12859-015-0856-x.

A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data.一种基于新型约束遗传算法的从稳态基因表达数据推断布尔网络的方法。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i383-i391. doi: 10.1093/bioinformatics/btab295.

Inference of compressed Potts graphical models.压缩Potts图形模型的推断

Phys Rev E. 2020 Jan;101(1-1):012309. doi: 10.1103/PhysRevE.101.012309.

Large pseudocounts and L2-norm penalties are necessary for the mean-field inference of Ising and Potts models.大伪计数和L2范数惩罚对于伊辛模型和波茨模型的平均场推断是必要的。

Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Jul;90(1):012132. doi: 10.1103/PhysRevE.90.012132. Epub 2014 Jul 28.

NetRAX: accurate and fast maximum likelihood phylogenetic network inference.NetRAX：准确快速的最大似然系统发育网络推断。

Bioinformatics. 2022 Aug 2;38(15):3725-3733. doi: 10.1093/bioinformatics/btac396.

Gene network inference by fusing data from diverse distributions.通过融合来自不同分布的数据进行基因网络推断。

Bioinformatics. 2015 Jun 15;31(12):i230-9. doi: 10.1093/bioinformatics/btv258.

Clustering of temporal gene expression data with mixtures of mixed effects models with a penalized likelihood.基于惩罚似然的混合效应模型混合的时间基因表达数据聚类。

Bioinformatics. 2019 Mar 1;35(5):778-786. doi: 10.1093/bioinformatics/bty696.

Rapid genotype refinement for whole-genome sequencing data using multi-variate normal distributions.使用多元正态分布进行全基因组测序数据的快速基因型细化。

Bioinformatics. 2016 Aug 1;32(15):2306-12. doi: 10.1093/bioinformatics/btw097. Epub 2016 Mar 9.

引用本文的文献

Symmetry, gauge freedoms, and the interpretability of sequence-function relationships.对称性、规范自由度与序列-功能关系的可解释性。

Phys Rev Res. 2025 Apr-Jun;7(2). doi: 10.1103/physrevresearch.7.023005. Epub 2025 Apr 2.

Evolutionary Sequence and Structural Basis for the Epistatic Origins of Drug Resistance in HIV.HIV耐药性上位起源的进化序列与结构基础

bioRxiv. 2025 May 2:2025.04.30.651576. doi: 10.1101/2025.04.30.651576.

Gauge fixing for sequence-function relationships.序列-功能关系的规范固定

PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information.通过对序列协变信息的有针对性选择，可以预测和解释蛋白质突变的功能影响。

Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2312335121. doi: 10.1073/pnas.2312335121. Epub 2024 Jun 18.

Gauge fixing for sequence-function relationships.序列-功能关系的规范固定。

bioRxiv. 2024 Jun 24:2024.05.12.593772. doi: 10.1101/2024.05.12.593772.

Symmetry, gauge freedoms, and the interpretability of sequence-function relationships.对称性、规范自由度以及序列-功能关系的可解释性。

bioRxiv. 2025 Mar 17:2024.05.12.593774. doi: 10.1101/2024.05.12.593774.

Towards parsimonious generative modeling of RNA families.RNA 家族生成模型的简约化研究。

Nucleic Acids Res. 2024 Jun 10;52(10):5465-5477. doi: 10.1093/nar/gkae289.

GENERALIST: A latent space based generative model for protein sequence families.通用：基于潜在空间的蛋白质序列家族生成模型。

PLoS Comput Biol. 2023 Nov 27;19(11):e1011655. doi: 10.1371/journal.pcbi.1011655. eCollection 2023 Nov.

Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis.丙型肝炎病毒的直接作用抗病毒耐药性是由上位性促进的。

Nat Commun. 2023 Nov 17;14(1):7457. doi: 10.1038/s41467-023-42550-6.

Infer global, predict local: Quantity-relevance trade-off in protein fitness predictions from sequence data.从序列数据推断全局，预测局部：蛋白质适应性预测中的数量-相关性权衡。

PLoS Comput Biol. 2023 Oct 26;19(10):e1011521. doi: 10.1371/journal.pcbi.1011521. eCollection 2023 Oct.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ACE：用于最大熵图形模型推断的自适应聚类扩展。

ACE: adaptive cluster expansion for maximum entropy graphical model inference.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献