Suppr
超能文献

基于安全模型的优化：平衡蛋白质序列设计中的探索与可靠性

Safe model based optimization balancing exploration and reliability for protein sequence design.

作者信息

Takizawa Shuuki, Mori Keita, Tanishiki Naoto, Yoshimura Dai, Ohta Atsushi, Teramoto Reiji

机构信息

Research Division, Chugai Pharmaceutical Co., Ltd, Yokohama, Japan.

出版信息

Sci Rep. 2025 Jul 29;15(1):27568. doi: 10.1038/s41598-025-12568-5.

DOI:10.1038/s41598-025-12568-5

PMID:40730605

Abstract

Discovering proteins with desired functionalities using protein engineering is time-consuming. Offline Model-Based Optimization (MBO) accelerates protein sequence design by exploring the vast protein sequence space using a trained proxy model. However, the proxy model often yields excessively good values that are far from the training dataset and causes pathological behavior in the MBO. To address this problem, we propose a mean deviation tree-structured Parzen estimator (MD-TPE) that penalizes unreliable samples located in the out-of-distribution region using the deviation of the predictive distribution of the Gaussian process (GP) model in the objective function to find the solution in the vicinity of the training data, where the proxy model can reliably predict. Upon examining the GFP dataset, compared to TPE, MD-TPE yielded fewer pathological samples. Additionally, it successfully identified mutants with higher binding affinity in the antibody affinity maturation task. Thus, our developed safe optimization approach is useful for protein engineering.

摘要

利用蛋白质工程发现具有所需功能的蛋白质非常耗时。基于离线模型的优化（MBO）通过使用训练好的代理模型探索广阔的蛋白质序列空间来加速蛋白质序列设计。然而，代理模型常常产生与训练数据集相差甚远的过高值，并在MBO中导致病态行为。为了解决这个问题，我们提出了一种平均偏差树结构的帕曾估计器（MD-TPE），它使用高斯过程（GP）模型在目标函数中的预测分布偏差来惩罚位于分布外区域的不可靠样本，以便在训练数据附近找到代理模型能够可靠预测的解决方案。在检查绿色荧光蛋白（GFP）数据集时，与TPE相比，MD-TPE产生的病态样本更少。此外，它在抗体亲和力成熟任务中成功识别出具有更高结合亲和力的突变体。因此，我们开发的安全优化方法对蛋白质工程很有用。

相似文献

Safe model based optimization balancing exploration and reliability for protein sequence design.

Sci Rep. 2025 Jul 29;15(1):27568. doi: 10.1038/s41598-025-12568-5.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Systemic Inflammatory Response Syndrome

Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.

Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Sexual Harassment and Prevention Training

Algorithm-based pain management for people with dementia in nursing homes.

Cochrane Database Syst Rev. 2022 Apr 1;4(4):CD013339. doi: 10.1002/14651858.CD013339.pub2.

Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.

Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.

Multiobjective optimization of CO injection under geomechanical risk in high water cut oil reservoirs using artificial intelligence approaches.

Sci Rep. 2025 Jul 15;15(1):25643. doi: 10.1038/s41598-025-10111-0.

A Novel Design of a Portable Birdcage via Meander Line Antenna (MLA) to Lower Beta Amyloid (Aβ) in Alzheimer's Disease.

IEEE J Transl Eng Health Med. 2025 Apr 10;13:158-173. doi: 10.1109/JTEHM.2025.3559693. eCollection 2025.

本文引用的文献

A bispecific antibody NXT007 exerts a hemostatic activity in hemophilia A monkeys enough to keep a nonhemophilic state.

J Thromb Haemost. 2024 Feb;22(2):430-440. doi: 10.1016/j.jtha.2023.09.034. Epub 2023 Nov 6.

Adaptive machine learning for protein engineering.

Curr Opin Struct Biol. 2022 Feb;72:145-152. doi: 10.1016/j.sbi.2021.11.002. Epub 2021 Dec 9.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

Evaluating Protein Transfer Learning with TAPE.

Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.

Unified rational protein engineering with sequence-based deep representation learning.

Nat Methods. 2019 Dec;16(12):1315-1322. doi: 10.1038/s41592-019-0598-1. Epub 2019 Oct 21.

Machine-learning-guided directed evolution for protein engineering.

Nat Methods. 2019 Aug;16(8):687-694. doi: 10.1038/s41592-019-0496-6. Epub 2019 Jul 15.

Local fitness landscape of the green fluorescent protein.

Nature. 2016 May 19;533(7603):397-401. doi: 10.1038/nature17995. Epub 2016 May 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于安全模型的优化：平衡蛋白质序列设计中的探索与可靠性

Safe model based optimization balancing exploration and reliability for protein sequence design.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译