• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质工程中基准不确定性量化

Benchmarking uncertainty quantification for protein engineering.

作者信息

Greenman Kevin P, Amini Ava P, Yang Kevin K

机构信息

Department of Chemical Engineering, Catholic Institute of Technology, Cambridge, Massachusetts, United States of America.

Department of Chemistry, Catholic Institute of Technology, Cambridge, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2025 Jan 7;21(1):e1012639. doi: 10.1371/journal.pcbi.1012639. eCollection 2025 Jan.

DOI:10.1371/journal.pcbi.1012639
PMID:39775201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11741572/
Abstract

Machine learning sequence-function models for proteins could enable significant advances in protein engineering, especially when paired with state-of-the-art methods to select new sequences for property optimization and/or model improvement. Such methods (Bayesian optimization and active learning) require calibrated estimations of model uncertainty. While studies have benchmarked a variety of deep learning uncertainty quantification (UQ) methods on standard and molecular machine-learning datasets, it is not clear if these results extend to protein datasets. In this work, we implemented a panel of deep learning UQ methods on regression tasks from the Fitness Landscape Inference for Proteins (FLIP) benchmark. We compared results across different degrees of distributional shift using metrics that assess each UQ method's accuracy, calibration, coverage, width, and rank correlation. Additionally, we compared these metrics using one-hot encoding and pretrained language model representations, and we tested the UQ methods in retrospective active learning and Bayesian optimization settings. Our results indicate that there is no single best UQ method across all datasets, splits, and metrics, and that uncertainty-based sampling is often unable to outperform greedy sampling in Bayesian optimization. These benchmarks enable us to provide recommendations for more effective design of biological sequences using machine learning.

摘要

用于蛋白质的机器学习序列-功能模型能够推动蛋白质工程取得重大进展,尤其是与最先进的方法相结合来选择新序列以优化特性和/或改进模型时。此类方法(贝叶斯优化和主动学习)需要对模型不确定性进行校准估计。虽然已有研究在标准和分子机器学习数据集上对多种深度学习不确定性量化(UQ)方法进行了基准测试,但尚不清楚这些结果是否适用于蛋白质数据集。在这项工作中,我们在蛋白质适应度景观推断(FLIP)基准的回归任务上实现了一组深度学习UQ方法。我们使用评估每种UQ方法的准确性、校准、覆盖率、宽度和秩相关性的指标,比较了不同程度分布偏移情况下的结果。此外,我们使用独热编码和预训练语言模型表示比较了这些指标,并在回顾性主动学习和贝叶斯优化设置中测试了UQ方法。我们的结果表明,在所有数据集、划分和指标中不存在单一的最佳UQ方法,并且基于不确定性的采样在贝叶斯优化中通常无法优于贪婪采样。这些基准使我们能够为使用机器学习更有效地设计生物序列提供建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/85f4a4c96553/pcbi.1012639.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/cb77e741848e/pcbi.1012639.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/fc14d0653bdb/pcbi.1012639.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/3fa933564489/pcbi.1012639.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/523ca7393e65/pcbi.1012639.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/aef2c6dc26c6/pcbi.1012639.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/85f4a4c96553/pcbi.1012639.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/cb77e741848e/pcbi.1012639.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/fc14d0653bdb/pcbi.1012639.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/3fa933564489/pcbi.1012639.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/523ca7393e65/pcbi.1012639.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/aef2c6dc26c6/pcbi.1012639.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab1f/11741572/85f4a4c96553/pcbi.1012639.g006.jpg

相似文献

1
Benchmarking uncertainty quantification for protein engineering.蛋白质工程中基准不确定性量化
PLoS Comput Biol. 2025 Jan 7;21(1):e1012639. doi: 10.1371/journal.pcbi.1012639. eCollection 2025 Jan.
2
UNIQUE: A Framework for Uncertainty Quantification Benchmarking.独特性:不确定性量化基准测试框架。
J Chem Inf Model. 2024 Nov 25;64(22):8379-8386. doi: 10.1021/acs.jcim.4c01578. Epub 2024 Nov 14.
3
Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning.基于三向决策贝叶斯深度学习的皮肤癌分类中的不确定性量化。
Comput Biol Med. 2021 Aug;135:104418. doi: 10.1016/j.compbiomed.2021.104418. Epub 2021 Apr 28.
4
Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability.用于蛋白质序列和小分子的线性缩放内核在提供不确定性量化和增强可解释性的同时,性能优于深度学习。
J Chem Inf Model. 2023 Aug 14;63(15):4589-4601. doi: 10.1021/acs.jcim.3c00601. Epub 2023 Jul 27.
5
Uncertainty quantification via localized gradients for deep learning-based medical image assessments.基于深度学习的医学图像评估的局部梯度不确定性量化。
Phys Med Biol. 2024 Jul 19;69(15). doi: 10.1088/1361-6560/ad611d.
6
Uncertainty quantification in multivariable regression for material property prediction with Bayesian neural networks.基于贝叶斯神经网络的材料性能预测多变量回归中的不确定性量化
Sci Rep. 2024 May 8;14(1):10543. doi: 10.1038/s41598-024-61189-x.
7
Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking.贝叶斯主动学习在蛋白质对接中的优化和不确定性量化。
J Chem Theory Comput. 2020 Aug 11;16(8):5334-5347. doi: 10.1021/acs.jctc.0c00476. Epub 2020 Jul 6.
8
A universal similarity based approach for predictive uncertainty quantification in materials science.基于通用相似性的材料科学预测不确定性量化方法。
Sci Rep. 2022 Sep 2;12(1):14931. doi: 10.1038/s41598-022-19205-5.
9
Flattening the curve-How to get better results with small deep-mutational-scanning datasets.拉平曲线——如何从小规模深度突变扫描数据集获得更好的结果。
Proteins. 2024 Jul;92(7):886-902. doi: 10.1002/prot.26686. Epub 2024 Mar 19.
10
Data-efficient Bayesian learning for radial dynamic MR reconstruction.用于径向动态磁共振成像重建的数据高效贝叶斯学习
Med Phys. 2023 Nov;50(11):6955-6977. doi: 10.1002/mp.16543. Epub 2023 Jun 27.

引用本文的文献

1
Active learning-assisted directed evolution.主动学习辅助的定向进化
Nat Commun. 2025 Jan 16;16(1):714. doi: 10.1038/s41467-025-55987-8.

本文引用的文献

1
Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability.用于蛋白质序列和小分子的线性缩放内核在提供不确定性量化和增强可解释性的同时,性能优于深度学习。
J Chem Inf Model. 2023 Aug 14;63(15):4589-4601. doi: 10.1021/acs.jcim.3c00601. Epub 2023 Jul 27.
2
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
3
Conformal prediction under feedback covariate shift for biomolecular design.
在生物分子设计中反馈协变量偏移下的保形预测。
Proc Natl Acad Sci U S A. 2022 Oct 25;119(43):e2204569119. doi: 10.1073/pnas.2204569119. Epub 2022 Oct 18.
4
Evaluating and Calibrating Uncertainty Prediction in Regression Tasks.评估与校准回归任务中的不确定性预测
Sensors (Basel). 2022 Jul 25;22(15):5540. doi: 10.3390/s22155540.
5
Empirical Frequentist Coverage of Deep Learning Uncertainty Quantification Procedures.深度学习不确定性量化程序的经验频率主义覆盖率
Entropy (Basel). 2021 Nov 30;23(12):1608. doi: 10.3390/e23121608.
6
Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production.机器学习指导酰基辅酶 A 还原酶工程提高体内脂肪醇产量。
Nat Commun. 2021 Oct 5;12(1):5825. doi: 10.1038/s41467-021-25831-w.
7
Evidential Deep Learning for Guided Molecular Property Prediction and Discovery.用于指导分子性质预测与发现的证据深度学习
ACS Cent Sci. 2021 Aug 25;7(8):1356-1367. doi: 10.1021/acscentsci.1c00546. Epub 2021 Jul 27.
8
Assigning confidence to molecular property prediction.为分子性质预测分配置信度。
Expert Opin Drug Discov. 2021 Sep;16(9):1009-1023. doi: 10.1080/17460441.2021.1925247. Epub 2021 Jun 15.
9
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
10
Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design.利用机器学习中的不确定性加速生物学发现和设计。
Cell Syst. 2020 Nov 18;11(5):461-477.e9. doi: 10.1016/j.cels.2020.09.007. Epub 2020 Oct 15.