• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

酶的结构与变异效应的可预测性相关。

Enzyme structure correlates with variant effect predictability.

作者信息

van der Flier Floris, Estell Dave, Pricelius Sina, Dankmeyer Lydia, van Stigt Thans Sander, Mulder Harm, Otsuka Rei, Goedegebuur Frits, Lammerts Laurens, Staphorst Diego, van Dijk Aalt D J, de Ridder Dick, Redestig Henning

机构信息

Department of Plant Sciences, Wageningen University & Research, Wageningen, 6708 PB, the Netherlands.

Health & Biosciences, International Flavors and Fragrances, Palo Alto, 94304 CA, USA.

出版信息

Comput Struct Biotechnol J. 2024 Oct 2;23:3489-3497. doi: 10.1016/j.csbj.2024.09.007. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.09.007
PMID:39435338
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11491678/
Abstract

Protein engineering increasingly relies on machine learning models to computationally pre-screen promising novel candidates. Although machine learning approaches have proven effective, their performance on prospective screening data leaves room for improvement; prediction accuracy can vary greatly from one protein variant to the next. So far, it is unclear what characterizes variants that are associated with large prediction error. In order to establish whether structural characteristics influence predictability, we created a novel high-order combinatorial dataset for an enzyme spanning 3,706 variants, that can be partitioned into subsets of variants with mutations at positions exclusively belonging to a particular structural class. By training four different supervised variant effect prediction (VEP) models on structurally partitioned subsets of our data, we found that predictability strongly depended on all four structural characteristics we tested; buriedness, number of contact residues, proximity to the active site and presence of secondary structure elements. These dependencies were also found in several single mutation enzyme variant datasets, albeit with dataset specific directions. Most importantly, we found that these dependencies were similar for all four models we tested, indicating that there are specific structure and function determinants that are insufficiently accounted for by current machine learning algorithms. Overall, our findings suggest that improvements can be made to VEP models by exploring new inductive biases and by leveraging different data modalities of protein variants, and that stratified dataset design can highlight areas of improvement for machine learning guided protein engineering.

摘要

蛋白质工程越来越依赖机器学习模型来对有前景的新型候选物进行计算预筛选。尽管机器学习方法已被证明是有效的,但其在前瞻性筛选数据上的性能仍有提升空间;不同蛋白质变体的预测准确性可能差异很大。到目前为止,尚不清楚与大预测误差相关的变体有哪些特征。为了确定结构特征是否影响可预测性,我们为一种酶创建了一个包含3706个变体的新型高阶组合数据集,该数据集可被划分为在仅属于特定结构类别的位置发生突变的变体子集。通过在我们数据的结构划分子集上训练四种不同的监督变体效应预测(VEP)模型,我们发现可预测性强烈依赖于我们测试的所有四个结构特征;埋藏性、接触残基数量、与活性位点的接近程度以及二级结构元件的存在。在几个单突变酶变体数据集中也发现了这些依赖性,尽管具有数据集特定的方向。最重要的是,我们发现我们测试的所有四个模型的这些依赖性都是相似的,这表明存在当前机器学习算法未充分考虑的特定结构和功能决定因素。总体而言,我们的研究结果表明,可以通过探索新的归纳偏差和利用蛋白质变体的不同数据模式来改进VEP模型,并且分层数据集设计可以突出机器学习指导的蛋白质工程的改进领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/7660b590cce7/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/c2a9f151aa4e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/1d24a2872bc6/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/cf4e1fe0c9e3/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/df75adf2825c/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/7660b590cce7/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/c2a9f151aa4e/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/1d24a2872bc6/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/cf4e1fe0c9e3/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/df75adf2825c/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/99de/11491678/7660b590cce7/gr005.jpg

相似文献

1
Enzyme structure correlates with variant effect predictability.酶的结构与变异效应的可预测性相关。
Comput Struct Biotechnol J. 2024 Oct 2;23:3489-3497. doi: 10.1016/j.csbj.2024.09.007. eCollection 2024 Dec.
2
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
3
Facilitating Machine Learning-Guided Protein Engineering with Smart Library Design and Massively Parallel Assays.通过智能文库设计和大规模平行分析促进机器学习引导的蛋白质工程
Adv Genet (Hoboken). 2021 Dec 7;2(4):2100038. doi: 10.1002/ggn2.202100038. eCollection 2021 Dec.
4
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
5
Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models.信息论特征选择和机器学习方法在遗传风险预测模型开发中的应用。
Sci Rep. 2021 Dec 2;11(1):23335. doi: 10.1038/s41598-021-00854-x.
6
Engineering proteinase K using machine learning and synthetic genes.利用机器学习和合成基因工程改造蛋白酶K
BMC Biotechnol. 2007 Mar 26;7:16. doi: 10.1186/1472-6750-7-16.
7
Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor.瞬态蛋白质-蛋白质相互作用预测:数据集、特征、算法和 RAD-T 预测器。
BMC Bioinformatics. 2014 Mar 24;15:82. doi: 10.1186/1471-2105-15-82.
8
A novel image-based machine learning model with superior accuracy and predictability for knee arthroplasty loosening detection and clinical decision making.一种基于图像的新型机器学习模型,在膝关节置换术松动检测及临床决策方面具有卓越的准确性和可预测性。
J Orthop Translat. 2022 Oct 6;36:177-183. doi: 10.1016/j.jot.2022.07.004. eCollection 2022 Sep.
9
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning.了解你的邻居:蛋白质结构预测借助上下文机器学习走向成熟。
J Comput Biol. 2020 May;27(5):796-814. doi: 10.1089/cmb.2019.0193. Epub 2019 Aug 30.
10
Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇:跨领域的系统评价与生化荟萃分析
Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.

引用本文的文献

1
Designing diverse and high-performance proteins with a large language model in the loop.利用大语言模型循环设计多样化且高性能的蛋白质。
PLoS Comput Biol. 2025 Jun 5;21(6):e1013119. doi: 10.1371/journal.pcbi.1013119. eCollection 2025 Jun.
2
Energy metric prediction for double insertion mutants via the RoseNet deep learning framework.通过RoseNet深度学习框架对双插入突变体进行能量指标预测。
Bioinform Adv. 2025 Jan 2;5(1):vbae198. doi: 10.1093/bioadv/vbae198. eCollection 2025.
3
Efficient Design of Affilin Protein Binders for HER3.

本文引用的文献

1
Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。
Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.
2
Is Novelty Predictable?新颖性可预测吗?
Cold Spring Harb Perspect Biol. 2024 Feb 1;16(2):a041469. doi: 10.1101/cshperspect.a041469.
3
Mega-scale experimental analysis of protein folding stability in biology and design.大规模实验分析生物学和设计中的蛋白质折叠稳定性。
HER3亲和蛋白结合剂的高效设计
Int J Mol Sci. 2025 May 14;26(10):4683. doi: 10.3390/ijms26104683.
Nature. 2023 Aug;620(7973):434-444. doi: 10.1038/s41586-023-06328-6. Epub 2023 Jul 19.
4
Using AlphaFold to predict the impact of single mutations on protein stability and function.利用 AlphaFold 预测单突变对蛋白质稳定性和功能的影响。
PLoS One. 2023 Mar 16;18(3):e0282689. doi: 10.1371/journal.pone.0282689. eCollection 2023.
5
Enzymes' Power for Plastics Degradation.酶在塑料降解方面的作用。
Chem Rev. 2023 May 10;123(9):5612-5701. doi: 10.1021/acs.chemrev.2c00644. Epub 2023 Mar 14.
6
Single-sequence protein structure prediction using a language model and deep learning.基于语言模型和深度学习的单序列蛋白质结构预测。
Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.
7
Robust deep learning-based protein sequence design using ProteinMPNN.使用 ProteinMPNN 进行健壮的基于深度学习的蛋白质序列设计。
Science. 2022 Oct 7;378(6615):49-56. doi: 10.1126/science.add2187. Epub 2022 Sep 15.
8
Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering.蛋白质工程机器学习变体效应预测工具的最新进展
Ind Eng Chem Res. 2022 May 18;61(19):6235-6245. doi: 10.1021/acs.iecr.1c04943. Epub 2022 Apr 6.
9
Learning protein fitness models from evolutionary and assay-labeled data.从进化和实验标记数据中学习蛋白质适应性模型。
Nat Biotechnol. 2022 Jul;40(7):1114-1122. doi: 10.1038/s41587-021-01146-5. Epub 2022 Jan 17.
10
Artificial intelligence challenges for predicting the impact of mutations on protein stability.预测突变对蛋白质稳定性影响的人工智能挑战。
Curr Opin Struct Biol. 2022 Feb;72:161-168. doi: 10.1016/j.sbi.2021.11.001. Epub 2021 Dec 15.