• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

金星突变中心:基于小规模实验数据对蛋白质突变效应预测因子的系统评估。

VenusMutHub: A systematic evaluation of protein mutation effect predictors on small-scale experimental data.

作者信息

Zhang Liang, Pang Hua, Zhang Chenghao, Li Song, Tan Yang, Jiang Fan, Li Mingchen, Yu Yuanxi, Zhou Ziyi, Wu Banghao, Zhou Bingxin, Liu Hao, Tan Pan, Hong Liang

机构信息

School of Physics and Astronomy & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai National Centre for Applied Mathematics (SJTU Center), MOE-LSC, Shanghai 200240, China.

Zhangjiang Institute for Advanced Study, Shanghai Jiao Tong University, Shanghai 201203, China.

出版信息

Acta Pharm Sin B. 2025 May;15(5):2454-2467. doi: 10.1016/j.apsb.2025.03.028. Epub 2025 Mar 14.

DOI:10.1016/j.apsb.2025.03.028
PMID:40487668
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12144947/
Abstract

In protein engineering, while computational models are increasingly used to predict mutation effects, their evaluations primarily rely on high-throughput deep mutational scanning (DMS) experiments that use surrogate readouts, which may not adequately capture the complex biochemical properties of interest. Many proteins and their functions cannot be assessed through high-throughput methods due to technical limitations or the nature of the desired properties, and this is particularly true for the real industrial application scenario. Therefore, the desired testing datasets, will be small-size (∼10-100) experimental data for each protein, and involve as many proteins as possible and as many properties as possible, which is, however, lacking. Here, we present VenusMutHub, a comprehensive benchmark study using 905 small-scale experimental datasets curated from published literature and public databases, spanning 527 proteins across diverse functional properties including stability, activity, binding affinity, and selectivity. These datasets feature direct biochemical measurements rather than surrogate readouts, providing a more rigorous assessment of model performance in predicting mutations that affect specific molecular functions. We evaluate 23 computational models across various methodological paradigms, such as sequence-based, structure-informed and evolutionary approaches. This benchmark provides practical guidance for selecting appropriate prediction methods in protein engineering applications where accurate prediction of specific functional properties is crucial.

摘要

在蛋白质工程中,虽然计算模型越来越多地用于预测突变效应,但其评估主要依赖于使用替代读数的高通量深度突变扫描(DMS)实验,而这些替代读数可能无法充分捕捉感兴趣的复杂生化特性。由于技术限制或所需特性的性质,许多蛋白质及其功能无法通过高通量方法进行评估,在实际工业应用场景中尤其如此。因此,理想的测试数据集应该是针对每种蛋白质的小规模(约10 - 100个)实验数据,并且涉及尽可能多的蛋白质和尽可能多的特性,然而目前却缺乏这样的数据。在此,我们展示了VenusMutHub,这是一项全面的基准研究,使用了从已发表文献和公共数据库中整理出的905个小规模实验数据集,涵盖了527种具有不同功能特性(包括稳定性、活性、结合亲和力和选择性)的蛋白质。这些数据集具有直接的生化测量结果,而非替代读数,从而能更严格地评估模型在预测影响特定分子功能的突变方面的性能。我们评估了23种涵盖各种方法范式的计算模型,如基于序列的、结构信息的和进化方法。该基准为在蛋白质工程应用中选择合适的预测方法提供了实用指导,在这些应用中,准确预测特定功能特性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/2e8361efefda/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/2abbf8e1ca45/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/7e9eff12a6ee/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/f41ccabc090d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/8169394e7fe7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/2e8361efefda/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/2abbf8e1ca45/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/7e9eff12a6ee/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/f41ccabc090d/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/8169394e7fe7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd8d/12144947/2e8361efefda/gr4.jpg

相似文献

1
VenusMutHub: A systematic evaluation of protein mutation effect predictors on small-scale experimental data.金星突变中心:基于小规模实验数据对蛋白质突变效应预测因子的系统评估。
Acta Pharm Sin B. 2025 May;15(5):2454-2467. doi: 10.1016/j.apsb.2025.03.028. Epub 2025 Mar 14.
2
Flattening the curve-How to get better results with small deep-mutational-scanning datasets.拉平曲线——如何从小规模深度突变扫描数据集获得更好的结果。
Proteins. 2024 Jul;92(7):886-902. doi: 10.1002/prot.26686. Epub 2024 Mar 19.
3
4
Protein multi-level structure feature-integrated deep learning method for mutational effect prediction.基于蛋白质多层次结构特征的深度学习基因突变效应预测方法。
Biotechnol J. 2024 Aug;19(8):e2400203. doi: 10.1002/biot.202400203.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Correspondence between functional scores from deep mutational scans and predicted effects on protein stability.深突变扫描的功能评分与预测对蛋白质稳定性影响之间的对应关系。
Protein Sci. 2023 Jul;32(7):e4688. doi: 10.1002/pro.4688.
7
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
8
: Structure-informed visualizations for deep mutational scanning and other mutation-based datasets.用于深度突变扫描和其他基于突变的数据集的结构信息可视化。
bioRxiv. 2023 Nov 1:2023.10.29.564578. doi: 10.1101/2023.10.29.564578.
9
VEFill: a model for accurate and generalizable deep mutational scanning score imputation across protein domains.VEFill:一种用于跨蛋白质结构域进行准确且可推广的深度突变扫描分数估算的模型。
bioRxiv. 2025 May 14:2025.05.14.653991. doi: 10.1101/2025.05.14.653991.
10
Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants.整合深度突变扫描和低通量诱变数据来预测氨基酸变异的影响。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad073. Epub 2023 Sep 18.

本文引用的文献

1
Semantical and geometrical protein encoding toward enhanced bioactivity and thermostability.面向增强生物活性和热稳定性的语义和几何蛋白质编码
Elife. 2025 May 2;13:RP98033. doi: 10.7554/eLife.98033.
2
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery.PPB亲和力:用于基于人工智能的蛋白质药物发现的蛋白质-蛋白质结合亲和力数据集。
Sci Data. 2024 Dec 3;11(1):1316. doi: 10.1038/s41597-024-03997-4.
3
Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy.
通过几何学习和预训练策略提高突变后蛋白质稳定性变化的预测。
Nat Comput Sci. 2024 Nov;4(11):840-850. doi: 10.1038/s43588-024-00716-2. Epub 2024 Oct 25.
4
Convolutions are competitive with transformers for protein sequence pretraining.卷积运算在蛋白质序列预训练方面与转换器竞争。
Cell Syst. 2024 Mar 20;15(3):286-294.e2. doi: 10.1016/j.cels.2024.01.008. Epub 2024 Feb 29.
5
ProGen2: Exploring the boundaries of protein language models.ProGen2:探索蛋白质语言模型的边界。
Cell Syst. 2023 Nov 15;14(11):968-978.e3. doi: 10.1016/j.cels.2023.10.002. Epub 2023 Oct 30.
6
Masked inverse folding with sequence transfer for protein representation learning.用于蛋白质表示学习的带序列转移的掩码反向折叠
Protein Eng Des Sel. 2023 Jan 21;36. doi: 10.1093/protein/gzad015.
7
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
8
Engineering protein-based therapeutics through structural and chemical design.通过结构和化学设计工程蛋白质类治疗药物。
Nat Commun. 2023 Apr 27;14(1):2411. doi: 10.1038/s41467-023-38039-x.
9
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
10
Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes.深度突变扫描:一种将基因型系统映射到表型的通用工具。
Front Genet. 2023 Jan 12;14:1087267. doi: 10.3389/fgene.2023.1087267. eCollection 2023.