• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于蛋白质序列和小分子的线性缩放内核在提供不确定性量化和增强可解释性的同时,性能优于深度学习。

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability.

作者信息

Parkinson Jonathan, Wang Wei

出版信息

J Chem Inf Model. 2023 Aug 14;63(15):4589-4601. doi: 10.1021/acs.jcim.3c00601. Epub 2023 Jul 27.

DOI:10.1021/acs.jcim.3c00601
PMID:37498239
Abstract

Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g., amino acid sequences) and graphs (e.g., small molecules). In this study, we introduce a group of random feature-approximated kernels for sequences and graphs that exhibit linear scaling with both the size of the training set and the size of the sequences or graphs. We incorporate these new kernels into our new Python library for GP regression, xGPR, and develop an efficient and scalable algorithm for fitting GPs equipped with these kernels to large datasets. We compare the performance of xGPR on 17 different benchmarks with both standard and state-of-the-art deep learning models and find that GP regression achieves highly competitive accuracy for these tasks while providing with well-calibrated uncertainty quantitation and improved interpretability. Finally, in a simple experiment, we illustrate how xGPR may be used as part of an active learning strategy to engineer a protein with a desired property in an automated way without human intervention.

摘要

高斯过程(GP)是一种贝叶斯模型,在机器学习的回归任务中具有诸多优势,比如能可靠地量化不确定性并提高可解释性。然而,其过高的计算成本以及难以适用于分析序列(如氨基酸序列)和图形(如小分子)的问题,阻碍了它们的应用。在本研究中,我们引入了一组用于序列和图形的随机特征近似核,这些核对于训练集大小以及序列或图形大小均呈现线性缩放。我们将这些新核纳入用于GP回归的新Python库xGPR中,并开发了一种高效且可扩展的算法,用于将配备这些核的GP拟合到大型数据集。我们将xGPR在17个不同基准测试上的性能与标准和先进的深度学习模型进行比较,发现GP回归在这些任务中实现了极具竞争力的准确率,同时提供了校准良好的不确定性量化和更高的可解释性。最后,在一个简单实验中,我们展示了xGPR如何作为主动学习策略的一部分,在无需人工干预的情况下以自动化方式设计具有所需特性的蛋白质。

相似文献

1
Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability.用于蛋白质序列和小分子的线性缩放内核在提供不确定性量化和增强可解释性的同时,性能优于深度学习。
J Chem Inf Model. 2023 Aug 14;63(15):4589-4601. doi: 10.1021/acs.jcim.3c00601. Epub 2023 Jul 27.
2
Deep learning uncertainty and confidence calibration for the five-class polyp classification from colonoscopy.深度学习不确定性和置信度校准在结肠镜下五分类息肉分类中的应用。
Med Image Anal. 2020 May;62:101653. doi: 10.1016/j.media.2020.101653. Epub 2020 Feb 28.
3
Evaluating Scalable Uncertainty Estimation Methods for Deep Learning-Based Molecular Property Prediction.评估基于深度学习的分子性质预测的可扩展不确定性估计方法。
J Chem Inf Model. 2020 Jun 22;60(6):2697-2717. doi: 10.1021/acs.jcim.9b00975. Epub 2020 Apr 24.
4
Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning.基于三向决策贝叶斯深度学习的皮肤癌分类中的不确定性量化。
Comput Biol Med. 2021 Aug;135:104418. doi: 10.1016/j.compbiomed.2021.104418. Epub 2021 Apr 28.
5
Deep Bayesian Gaussian processes for uncertainty estimation in electronic health records.深度贝叶斯高斯过程在电子健康记录中的不确定性估计。
Sci Rep. 2021 Oct 19;11(1):20685. doi: 10.1038/s41598-021-00144-6.
6
Generalising uncertainty improves accuracy and safety of deep learning analytics applied to oncology.将不确定性进行泛化可以提高深度学习分析在肿瘤学中的准确性和安全性。
Sci Rep. 2023 May 6;13(1):7395. doi: 10.1038/s41598-023-31126-5.
7
BayeshERG: a robust, reliable and interpretable deep learning model for predicting hERG channel blockers.BayeshERG:一种用于预测 hERG 通道阻滞剂的强大、可靠且可解释的深度学习模型。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac211.
8
Calibrated uncertainty estimation for interpretable proton computed tomography image correction using Bayesian deep learning.使用贝叶斯深度学习进行可解释质子计算机断层摄影图像校正的校准不确定性估计。
Phys Med Biol. 2021 Mar 16;66(6):065029. doi: 10.1088/1361-6560/abe956.
9
Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control.贝叶斯快速全脑分割:结构质量控制中的深度全脑分割中的模型不确定性。
Neuroimage. 2019 Jul 15;195:11-22. doi: 10.1016/j.neuroimage.2019.03.042. Epub 2019 Mar 26.
10
Supervised learning and model analysis with compositional data.基于组合数据的监督学习和模型分析。
PLoS Comput Biol. 2023 Jun 30;19(6):e1011240. doi: 10.1371/journal.pcbi.1011240. eCollection 2023 Jun.

引用本文的文献

1
Advancing genetic engineering with active learning: theory, implementations and potential opportunities.通过主动学习推进基因工程:理论、实现与潜在机遇
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf286.
2
Machine learning-driven discovery of highly selective antifungal peptides containing non-canonical β-amino acids.机器学习驱动发现含非经典β-氨基酸的高选择性抗真菌肽
Chem Sci. 2025 Feb 20;16(13):5579-5594. doi: 10.1039/d4sc06689h. eCollection 2025 Mar 26.
3
Benchmarking uncertainty quantification for protein engineering.
蛋白质工程中基准不确定性量化
PLoS Comput Biol. 2025 Jan 7;21(1):e1012639. doi: 10.1371/journal.pcbi.1012639. eCollection 2025 Jan.
4
RESP2: An uncertainty aware multi-target multi-property optimization AI pipeline for antibody discovery.RESP2:一种用于抗体发现的具有不确定性感知的多靶点多属性优化人工智能管道。
bioRxiv. 2025 Mar 9:2024.07.30.605700. doi: 10.1101/2024.07.30.605700.
5
TUnA: an uncertainty-aware transformer model for sequence-based protein-protein interaction prediction.TUnA:一种基于序列的蛋白质-蛋白质相互作用预测的不确定性感知的 Transformer 模型。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae359.
6
For antibody sequence generative modeling, mixture models may be all you need.对于抗体序列生成建模,混合模型可能就是你所需要的。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae278.