基于贝叶斯流网络的蛋白质序列建模。

Protein sequence modelling with Bayesian flow networks.

作者信息

Atkinson Timothy, Barrett Thomas D, Cameron Scott, Guloglu Bora, Greenig Matthew, Tan Charlie B, Robinson Louis, Graves Alex, Copoiu Liviu, Laterre Alexandre

机构信息

InstaDeep, 5 Merchant Square, London, W2 1AY, England.

出版信息

Nat Commun. 2025 Apr 3;16(1):3197. doi: 10.1038/s41467-025-58250-2.

DOI:10.1038/s41467-025-58250-2

PMID:40180946

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11968962/

Abstract

Exploring the vast and largely uncharted territory of amino acid sequences is crucial for understanding complex protein functions and the engineering of novel therapeutic proteins. Whilst generative machine learning has advanced protein sequence modelling, no existing approach is proficient in both unconditional and conditional generation. In this work, we propose that Bayesian Flow Networks (BFNs), a recently introduced framework for generative modelling, can address these challenges. We present ProtBFN, a 650M parameter model trained on protein sequences curated from UniProtKB, which generates natural-like, diverse, structurally coherent, and novel protein sequences, significantly outperforming leading autoregressive and discrete diffusion models. Further, we fine-tune ProtBFN on heavy chains from the Observed Antibody Space to obtain an antibody-specific model, AbBFN, which we use to evaluate zero-shot conditional generation capabilities. AbBFN is found to be competitive with or better than antibody-specific BERT-style models when applied to predicting individual framework or complimentary determining regions.

摘要

探索氨基酸序列这一广阔且大多未被描绘的领域对于理解复杂的蛋白质功能以及新型治疗性蛋白质的工程设计至关重要。虽然生成式机器学习推动了蛋白质序列建模的发展，但现有的方法在无条件生成和条件生成方面都不够精通。在这项工作中，我们提出贝叶斯流网络（BFN），这是一种最近引入的生成建模框架，可以应对这些挑战。我们展示了ProtBFN，这是一个在从UniProtKB精心挑选的蛋白质序列上训练的6.5亿参数模型，它能生成自然、多样、结构连贯且新颖的蛋白质序列，显著优于领先的自回归模型和离散扩散模型。此外，我们在观察到的抗体空间的重链上对ProtBFN进行微调，以获得一个抗体特异性模型AbBFN，我们用它来评估零样本条件生成能力。当应用于预测单个框架或互补决定区时，发现AbBFN与抗体特异性的BERT风格模型具有竞争力或更优。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于贝叶斯流网络的蛋白质序列建模。

Protein sequence modelling with Bayesian flow networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于贝叶斯流网络的蛋白质序列建模。

Protein sequence modelling with Bayesian flow networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献