• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于预测蛋白质p值和电离状态的KaMLs:树模型就足够了吗?

KaMLs for Predicting Protein p Values and Ionization States: Are Trees All You Need?

作者信息

Shen Mingzhe, Kortzak Daniel, Ambrozak Simon, Bhatnagar Shubham, Buchanan Ian, Liu Ruibin, Shen Jana

机构信息

Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States.

Department of Computer Science, University of Maryland College Park, College Park, Maryland 20742, United States.

出版信息

J Chem Theory Comput. 2025 Feb 11;21(3):1446-1458. doi: 10.1021/acs.jctc.4c01602. Epub 2025 Jan 30.

DOI:10.1021/acs.jctc.4c01602
PMID:39882632
Abstract

Despite its importance in understanding biology and computer-aided drug discovery, the accurate prediction of protein ionization states remains a formidable challenge. Physics-based approaches struggle to capture the small, competing contributions in the complex protein environment, while machine learning (ML) is hampered by the scarcity of experimental data. Here, we report the development of p ML (KaML) models based on decision trees and graph attention networks (GAT), exploiting physicochemical understanding and a new experiment p database (PKAD-3) enriched with highly shifted p's. KaML-CBtree significantly outperforms the current state of the art in predicting p values and ionization states across all six titratable amino acids, notably achieving accurate predictions for deprotonated cysteines and lysines─a blind spot in previous models. The superior performance of KaMLs is achieved in part through several innovations, including the separate treatment of acid and base, data augmentation using AlphaFold structures, and model pretraining on a theoretical p database. We also introduce the classification of protonation states as a metric for evaluating p prediction models. A meta-feature analysis suggests a possible reason for the lightweight tree model to outperform the more complex deep learning GAT. We release an end-to-end p predictor based on KaML-CBtree and the new PKAD-3 database, which facilitates a variety of applications and provides the foundation for further advances in protein electrostatic research.

摘要

尽管准确预测蛋白质电离状态在理解生物学和计算机辅助药物发现中具有重要意义,但它仍然是一项艰巨的挑战。基于物理的方法难以捕捉复杂蛋白质环境中的微小竞争贡献,而机器学习(ML)则受到实验数据稀缺的阻碍。在此,我们报告了基于决策树和图注意力网络(GAT)开发的pML(KaML)模型,利用物理化学知识和一个富含高度偏移p值的新实验数据库(PKAD - 3)。在预测所有六种可滴定氨基酸的p值和电离状态方面,KaML - CBtree显著优于当前的先进技术,尤其在预测去质子化的半胱氨酸和赖氨酸方面实现了准确预测,这是先前模型的一个盲点。KaML模型的卓越性能部分得益于多项创新,包括酸碱的单独处理、使用AlphaFold结构进行数据增强以及在理论p数据库上进行模型预训练。我们还引入了质子化状态分类作为评估p预测模型的一个指标。元特征分析揭示了轻量级树模型优于更复杂的深度学习GAT的一个可能原因。我们发布了一个基于KaML - CBtree和新的PKAD - 3数据库的端到端p预测器,它便于各种应用,并为蛋白质静电研究的进一步发展奠定了基础。

相似文献

1
KaMLs for Predicting Protein p Values and Ionization States: Are Trees All You Need?用于预测蛋白质p值和电离状态的KaMLs:树模型就足够了吗?
J Chem Theory Comput. 2025 Feb 11;21(3):1446-1458. doi: 10.1021/acs.jctc.4c01602. Epub 2025 Jan 30.
2
KaMLs for Predicting Protein p Values and Ionization States: Are Trees All You Need?用于预测蛋白质p值和电离状态的KaMLs:你只需要决策树吗?
bioRxiv. 2025 Jan 30:2024.11.09.622800. doi: 10.1101/2024.11.09.622800.
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
5
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
6
Classification of finger movements through optimal EEG channel and feature selection.通过最优脑电图通道和特征选择对手指运动进行分类。
Front Hum Neurosci. 2025 Jul 16;19:1633910. doi: 10.3389/fnhum.2025.1633910. eCollection 2025.
7
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.改善消费者安全有效用药的干预措施:系统评价概述
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):CD007768. doi: 10.1002/14651858.CD007768.pub3.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

引用本文的文献

1
Protein Electrostatic Properties are Fine-Tuned Through Evolution.蛋白质的静电特性通过进化得到精细调节。
Res Sq. 2025 Apr 28:rs.3.rs-6471091. doi: 10.21203/rs.3.rs-6471091/v1.
2
Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning.通过图神经网络和迁移学习对蛋白质分子特性进行准确预测
J Chem Theory Comput. 2025 May 13;21(9):4830-4845. doi: 10.1021/acs.jctc.4c01682. Epub 2025 Apr 24.
3
Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning.

本文引用的文献

1
Constant pH Simulation with FMM Electrostatics in GROMACS. (A) Design and Applications.GROMACS中基于快速多极子方法静电学的恒定pH模拟。(A)设计与应用。
J Chem Theory Comput. 2025 Feb 25;21(4):1762-1786. doi: 10.1021/acs.jctc.4c01318. Epub 2025 Feb 7.
2
Accurate Protein p Prediction with Physical Organic Chemistry Guided 3D Protein Representation.物理有机化学指导的 3D 蛋白质表示的准确蛋白质 p 预测。
J Chem Inf Model. 2024 Jun 10;64(11):4410-4418. doi: 10.1021/acs.jcim.4c00354. Epub 2024 May 23.
3
Machine Learning Models to Interrogate Proteome-Wide Covalent Ligandabilities Directed at Cysteines.
通过图神经网络和迁移学习对蛋白质分子性质进行准确预测。
bioRxiv. 2024 Dec 12:2024.12.10.627714. doi: 10.1101/2024.12.10.627714.
用于探究针对半胱氨酸的全蛋白质组共价配体能力的机器学习模型
JACS Au. 2024 Apr 5;4(4):1374-1384. doi: 10.1021/jacsau.3c00749. eCollection 2024 Apr 22.
4
PypKa server: online pKa predictions and biomolecular structure preparation with precomputed data from PDB and AlphaFold DB.PypKa 服务器:使用来自 PDB 和 AlphaFold DB 的预计算数据进行在线 pKa 预测和生物分子结构准备。
Nucleic Acids Res. 2024 Jul 5;52(W1):W294-W298. doi: 10.1093/nar/gkae255.
5
DeepKa Web Server: High-Throughput Protein p Prediction.DeepKa Web 服务器:高通量蛋白质 p 预测。
J Chem Inf Model. 2024 Apr 22;64(8):2933-2940. doi: 10.1021/acs.jcim.3c02013. Epub 2024 Mar 26.
6
Ion and lipid orchestration of secondary active transport.离子和脂质对次级主动转运的调控。
Nature. 2024 Feb;626(8001):963-974. doi: 10.1038/s41586-024-07062-3. Epub 2024 Feb 28.
7
PKAD-2: New entries and expansion of functionalities of the database of experimentally measured pKa's of proteins.PKAD-2:蛋白质实验测量pKa数据库的新条目及功能扩展
J Comput Biophys Chem. 2023 Aug;22(5):515-524. doi: 10.1142/s2737416523500230. Epub 2023 Apr 25.
8
Basis for Accurate Protein p Prediction with Machine Learning.基于机器学习的蛋白质 p 值准确预测。
J Chem Inf Model. 2023 May 22;63(10):2936-2947. doi: 10.1021/acs.jcim.3c00254. Epub 2023 May 5.
9
Constant pH molecular dynamics simulations: Current status and recent applications.恒 pH 分子动力学模拟:现状与最新应用。
Curr Opin Struct Biol. 2022 Dec;77:102498. doi: 10.1016/j.sbi.2022.102498. Epub 2022 Nov 18.
10
GPU-Accelerated All-Atom Particle-Mesh Ewald Continuous Constant pH Molecular Dynamics in Amber.GPU 加速全原子粒子网格 Ewald 连续常数 pH 分子动力学在 Amber 中。
J Chem Theory Comput. 2022 Dec 13;18(12):7510-7527. doi: 10.1021/acs.jctc.2c00586. Epub 2022 Nov 15.