文献检索，用中文搜 PubMed

In protein engineering, while computational models are increasingly used to predict mutation effects, their evaluations primarily rely on high-throughput deep mutational scanning (DMS) experiments that use surrogate readouts, which may not adequately capture the complex biochemical properties of interest. Many proteins and their functions cannot be assessed through high-throughput methods due to technical limitations or the nature of the desired properties, and this is particularly true for the real industrial application scenario. Therefore, the desired testing datasets, will be small-size (∼10-100) experimental data for each protein, and involve as many proteins as possible and as many properties as possible, which is, however, lacking. Here, we present VenusMutHub, a comprehensive benchmark study using 905 small-scale experimental datasets curated from published literature and public databases, spanning 527 proteins across diverse functional properties including stability, activity, binding affinity, and selectivity. These datasets feature direct biochemical measurements rather than surrogate readouts, providing a more rigorous assessment of model performance in predicting mutations that affect specific molecular functions. We evaluate 23 computational models across various methodological paradigms, such as sequence-based, structure-informed and evolutionary approaches. This benchmark provides practical guidance for selecting appropriate prediction methods in protein engineering applications where accurate prediction of specific functional properties is crucial.

VenusMutHub: A systematic evaluation of protein mutation effect predictors on small-scale experimental data.

作者信息

Zhang Liang, Pang Hua, Zhang Chenghao, Li Song, Tan Yang, Jiang Fan, Li Mingchen, Yu Yuanxi, Zhou Ziyi, Wu Banghao, Zhou Bingxin, Liu Hao, Tan Pan, Hong Liang

机构信息

School of Physics and Astronomy & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai National Centre for Applied Mathematics (SJTU Center), MOE-LSC, Shanghai 200240, China.

Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 201203, China.

出版信息

Acta Pharm Sin B. 2025 May;15(5):2454-2467. doi: 10.1016/j.apsb.2025.03.028. Epub 2025 Mar 14.

在蛋白质工程中，虽然计算模型越来越多地用于预测突变效应，但其评估主要依赖于使用替代读数的高通量深度突变扫描（DMS）实验，而这些替代读数可能无法充分捕捉感兴趣的复杂生化特性。由于技术限制或所需特性的性质，许多蛋白质及其功能无法通过高通量方法进行评估，在实际工业应用场景中尤其如此。因此，理想的测试数据集应该是针对每种蛋白质的小规模（约10 - 100个）实验数据，并且涉及尽可能多的蛋白质和尽可能多的特性，然而目前却缺乏这样的数据。在此，我们展示了VenusMutHub，这是一项全面的基准研究，使用了从已发表文献和公共数据库中整理出的905个小规模实验数据集，涵盖了527种具有不同功能特性（包括稳定性、活性、结合亲和力和选择性）的蛋白质。这些数据集具有直接的生化测量结果，而非替代读数，从而能更严格地评估模型在预测影响特定分子功能的突变方面的性能。我们评估了23种涵盖各种方法范式的计算模型，如基于序列的、结构信息的和进化方法。该基准为在蛋白质工程应用中选择合适的预测方法提供了实用指导，在这些应用中，准确预测特定功能特性至关重要。