Suppr超能文献

基于安全模型的优化:平衡蛋白质序列设计中的探索与可靠性

Safe model based optimization balancing exploration and reliability for protein sequence design.

作者信息

Takizawa Shuuki, Mori Keita, Tanishiki Naoto, Yoshimura Dai, Ohta Atsushi, Teramoto Reiji

机构信息

Research Division, Chugai Pharmaceutical Co., Ltd, Yokohama, Japan.

出版信息

Sci Rep. 2025 Jul 29;15(1):27568. doi: 10.1038/s41598-025-12568-5.

Abstract

Discovering proteins with desired functionalities using protein engineering is time-consuming. Offline Model-Based Optimization (MBO) accelerates protein sequence design by exploring the vast protein sequence space using a trained proxy model. However, the proxy model often yields excessively good values that are far from the training dataset and causes pathological behavior in the MBO. To address this problem, we propose a mean deviation tree-structured Parzen estimator (MD-TPE) that penalizes unreliable samples located in the out-of-distribution region using the deviation of the predictive distribution of the Gaussian process (GP) model in the objective function to find the solution in the vicinity of the training data, where the proxy model can reliably predict. Upon examining the GFP dataset, compared to TPE, MD-TPE yielded fewer pathological samples. Additionally, it successfully identified mutants with higher binding affinity in the antibody affinity maturation task. Thus, our developed safe optimization approach is useful for protein engineering.

摘要

利用蛋白质工程发现具有所需功能的蛋白质非常耗时。基于离线模型的优化(MBO)通过使用训练好的代理模型探索广阔的蛋白质序列空间来加速蛋白质序列设计。然而,代理模型常常产生与训练数据集相差甚远的过高值,并在MBO中导致病态行为。为了解决这个问题,我们提出了一种平均偏差树结构的帕曾估计器(MD-TPE),它使用高斯过程(GP)模型在目标函数中的预测分布偏差来惩罚位于分布外区域的不可靠样本,以便在训练数据附近找到代理模型能够可靠预测的解决方案。在检查绿色荧光蛋白(GFP)数据集时,与TPE相比,MD-TPE产生的病态样本更少。此外,它在抗体亲和力成熟任务中成功识别出具有更高结合亲和力的突变体。因此,我们开发的安全优化方法对蛋白质工程很有用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验