Suppr超能文献

稳定性预测器:一种基于结构的图变换框架,用于识别稳定化突变。

Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations.

机构信息

UT Austin, Department of Computer Science, Austin, TX, 78712, USA.

Intelligent Proteins, LLC, Austin, TX, 78712, USA.

出版信息

Nat Commun. 2024 Jul 23;15(1):6170. doi: 10.1038/s41467-024-49780-2.

Abstract

Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.

摘要

工程化稳定蛋白质是工业和制药生物技术发展的一个基本挑战。我们提出了 Stability Oracle:一个基于结构的图变换框架,在准确识别热力学稳定突变方面取得了 SOTA 性能。我们的框架引入了几项创新,以克服数据稀缺和偏差、泛化和计算时间等方面的已知挑战,例如:热力学排列用于数据扩充、结构氨基酸嵌入来用单个结构建模突变、特定于蛋白质结构的注意力偏差机制,使变压器成为图神经网络的可行替代品。我们提供了训练/测试分割,以减轻数据泄漏并确保适当的模型评估。此外,为了检查我们的数据工程贡献,我们微调了 ESM2 表示形式(Prostata-IFML),并为基于序列的模型实现了 SOTA。值得注意的是,即使 Stability Oracle 是在少 2000 倍的蛋白质和少 548 倍的参数上进行预训练的,但它的性能仍然优于 Prostata-IFML。我们的框架为基于结构的变压器微调到几乎任何表型建立了一条道路,这是加速蛋白质生物技术发展的必要任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cda/11266546/1b9757d3865c/41467_2024_49780_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验