• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

交叉对接构象对用于蛋白质-配体结合构象预测的机器学习分类器性能的影响。

The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.

作者信息

Shen Chao, Hu Xueping, Gao Junbo, Zhang Xujun, Zhong Haiyang, Wang Zhe, Xu Lei, Kang Yu, Cao Dongsheng, Hou Tingjun

机构信息

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.

State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang, 310058, People's Republic of China.

出版信息

J Cheminform. 2021 Oct 16;13(1):81. doi: 10.1186/s13321-021-00560-w.

DOI:10.1186/s13321-021-00560-w
PMID:34656169
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8520186/
Abstract

Structure-based drug design depends on the detailed knowledge of the three-dimensional (3D) structures of protein-ligand binding complexes, but accurate prediction of ligand-binding poses is still a major challenge for molecular docking due to deficiency of scoring functions (SFs) and ignorance of protein flexibility upon ligand binding. In this study, based on a cross-docking dataset dedicatedly constructed from the PDBbind database, we developed several XGBoost-trained classifiers to discriminate the near-native binding poses from decoys, and systematically assessed their performance with/without the involvement of the cross-docked poses in the training/test sets. The calculation results illustrate that using Extended Connectivity Interaction Features (ECIF), Vina energy terms and docking pose ranks as the features can achieve the best performance, according to the validation through the random splitting or refined-core splitting and the testing on the re-docked or cross-docked poses. Besides, it is found that, despite the significant decrease of the performance for the threefold clustered cross-validation, the inclusion of the Vina energy terms can effectively ensure the lower limit of the performance of the models and thus improve their generalization capability. Furthermore, our calculation results also highlight the importance of the incorporation of the cross-docked poses into the training of the SFs with wide application domain and high robustness for binding pose prediction. The source code and the newly-developed cross-docking datasets can be freely available at https://github.com/sc8668/ml_pose_prediction and https://zenodo.org/record/5525936 , respectively, under an open-source license. We believe that our study may provide valuable guidance for the development and assessment of new machine learning-based SFs (MLSFs) for the predictions of protein-ligand binding poses.

摘要

基于结构的药物设计依赖于蛋白质-配体结合复合物三维(3D)结构的详细知识,但由于评分函数(SFs)的不足以及在配体结合时对蛋白质灵活性的忽视,准确预测配体结合姿势仍然是分子对接的一项重大挑战。在本研究中,基于专门从PDBbind数据库构建的交叉对接数据集,我们开发了几个经XGBoost训练的分类器,以区分近天然结合姿势与诱饵,并在训练/测试集中有/无交叉对接姿势参与的情况下系统地评估了它们的性能。计算结果表明,根据通过随机拆分或精炼核心拆分进行的验证以及对重新对接或交叉对接姿势的测试,使用扩展连接相互作用特征(ECIF)、Vina能量项和对接姿势排名作为特征可以实现最佳性能。此外,还发现,尽管三重聚类交叉验证的性能显著下降,但包含Vina能量项可以有效地确保模型性能的下限,从而提高其泛化能力。此外,我们的计算结果还突出了将交叉对接姿势纳入具有广泛应用领域和高结合姿势预测稳健性的SFs训练中的重要性。源代码和新开发的交叉对接数据集分别可在开源许可下从https://github.com/sc8668/ml_pose_prediction和https://zenodo.org/record/5525936免费获取。我们相信,我们的研究可能为开发和评估用于预测蛋白质-配体结合姿势的新型基于机器学习的SFs(MLSFs)提供有价值的指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/5e386660323a/13321_2021_560_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/7cc11c9c017c/13321_2021_560_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/f8872c4c54be/13321_2021_560_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/48d22ee4734e/13321_2021_560_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/ebd44cabcb96/13321_2021_560_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/c5240e97725f/13321_2021_560_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/83a8d7391014/13321_2021_560_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/dc61685f382c/13321_2021_560_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/7ccf1b21e697/13321_2021_560_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/5e386660323a/13321_2021_560_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/7cc11c9c017c/13321_2021_560_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/f8872c4c54be/13321_2021_560_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/48d22ee4734e/13321_2021_560_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/ebd44cabcb96/13321_2021_560_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/c5240e97725f/13321_2021_560_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/83a8d7391014/13321_2021_560_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/dc61685f382c/13321_2021_560_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/7ccf1b21e697/13321_2021_560_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73aa/8520186/5e386660323a/13321_2021_560_Fig9_HTML.jpg

相似文献

1
The impact of cross-docked poses on performance of machine learning classifier for protein-ligand binding pose prediction.交叉对接构象对用于蛋白质-配体结合构象预测的机器学习分类器性能的影响。
J Cheminform. 2021 Oct 16;13(1):81. doi: 10.1186/s13321-021-00560-w.
2
Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins.用于识别对接至已知和新型蛋白质的配体天然构象的机器学习评分函数。
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S3. doi: 10.1186/1471-2105-16-S6-S3. Epub 2015 Apr 17.
3
Learning from Docked Ligands: Ligand-Based Features Rescue Structure-Based Scoring Functions When Trained on Docked Poses.从对接配体中学习:当基于配体的特征基于对接构象进行训练时,可以挽救基于结构的评分函数。
J Chem Inf Model. 2022 Nov 28;62(22):5329-5341. doi: 10.1021/acs.jcim.1c00096. Epub 2021 Sep 1.
4
Correcting the impact of docking pose generation error on binding affinity prediction.校正对接姿势生成误差对结合亲和力预测的影响。
BMC Bioinformatics. 2016 Sep 22;17(Suppl 11):308. doi: 10.1186/s12859-016-1169-4.
5
Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。
J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.
6
RNAPosers: Machine Learning Classifiers for Ribonucleic Acid-Ligand Poses.RNA构象预测器:用于核糖核酸-配体构象的机器学习分类器
J Phys Chem B. 2020 Jun 4;124(22):4436-4445. doi: 10.1021/acs.jpcb.0c02322. Epub 2020 May 19.
7
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性:在虚拟筛选中,基于目标的机器学习打分函数能为我们带来什么?
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.
8
A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function.一个由深度学习和传统评分函数引导的完全可微配体构象优化框架。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac520.
9
DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures.DeepBSP:一种用于准确预测蛋白质-配体对接结构的机器学习方法。
J Chem Inf Model. 2021 May 24;61(5):2231-2240. doi: 10.1021/acs.jcim.1c00334. Epub 2021 May 12.
10
SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.SCORCH:利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。
J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

引用本文的文献

1
The future of pharmaceuticals: Artificial intelligence in drug discovery and development.制药的未来:药物研发中的人工智能
J Pharm Anal. 2025 Aug;15(8):101248. doi: 10.1016/j.jpha.2025.101248. Epub 2025 Feb 26.
2
Can Deep Learning Blind Docking Methods be Used to Predict Allosteric Compounds?深度学习盲对接方法可用于预测变构化合物吗?
J Chem Inf Model. 2025 Apr 14;65(7):3737-3748. doi: 10.1021/acs.jcim.5c00331. Epub 2025 Apr 1.
3
Robustly interrogating machine learning-based scoring functions: what are they learning?

本文引用的文献

1
GNINA 1.0: molecular docking with deep learning.GNINA 1.0:基于深度学习的分子对接
J Cheminform. 2021 Jun 9;13(1):43. doi: 10.1186/s13321-021-00522-2.
2
Recent progress on the prospective application of machine learning to structure-based virtual screening.基于结构的虚拟筛选中机器学习的前瞻性应用的最新进展。
Curr Opin Chem Biol. 2021 Dec;65:28-34. doi: 10.1016/j.cbpa.2021.04.009. Epub 2021 May 27.
3
DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures.DeepBSP:一种用于准确预测蛋白质-配体对接结构的机器学习方法。
深入探究基于机器学习的评分函数:它们在学习什么?
Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf040.
4
Benchmarking Cross-Docking Strategies in Kinase Drug Discovery.激酶药物发现中的交叉对接策略基准测试
J Chem Inf Model. 2024 Dec 9;64(23):8848-8858. doi: 10.1021/acs.jcim.4c00905. Epub 2024 Nov 18.
5
Application of Virtual Drug Study to New Drug Research and Development: Challenges and Opportunity.虚拟药物研究在新药研发中的应用:挑战与机遇。
Clin Pharmacokinet. 2024 Sep;63(9):1239-1249. doi: 10.1007/s40262-024-01416-w. Epub 2024 Sep 3.
6
A new paradigm for applying deep learning to protein-ligand interaction prediction.深度学习在蛋白质-配体相互作用预测中的应用的新范例。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae145.
7
Targeting JMJD1C to selectively disrupt tumor T cell fitness enhances antitumor immunity.靶向 JMJD1C 以选择性破坏肿瘤 T 细胞适应性可增强抗肿瘤免疫。
Nat Immunol. 2024 Mar;25(3):525-536. doi: 10.1038/s41590-024-01746-8. Epub 2024 Feb 14.
8
Comprehensive Analysis of Extract: Chemical Profiling, Antioxidant Assessment, and CLASP Protein Interaction for Drug Design in Neurodegenerative Diseases.提取物综合分析:化学剖析、抗氧化评估以及用于神经退行性疾病药物设计的CLASP蛋白相互作用
Curr Comput Aided Drug Des. 2025;21(1):94-109. doi: 10.2174/0115734099284849231212095407.
9
CarsiDock: a deep learning paradigm for accurate protein-ligand docking and screening based on large-scale pre-training.CarsiDock:一种基于大规模预训练的用于精确蛋白质-配体对接和筛选的深度学习范式。
Chem Sci. 2023 Dec 19;15(4):1449-1471. doi: 10.1039/d3sc05552c. eCollection 2024 Jan 24.
10
Benchmarking Cross-Docking Strategies for Structure-Informed Machine Learning in Kinase Drug Discovery.激酶药物发现中基于结构信息的机器学习的交叉对接策略基准测试
bioRxiv. 2023 Sep 14:2023.09.11.557138. doi: 10.1101/2023.09.11.557138.
J Chem Inf Model. 2021 May 24;61(5):2231-2240. doi: 10.1021/acs.jcim.1c00334. Epub 2021 May 12.
4
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models.图神经网络能否为药物发现学习更好的分子表示?基于描述符和基于图的模型的比较研究。
J Cheminform. 2021 Feb 17;13(1):12. doi: 10.1186/s13321-020-00479-8.
5
ASFP (Artificial Intelligence based Scoring Function Platform): a web server for the development of customized scoring functions.ASFP(基于人工智能的评分函数平台):一个用于开发定制评分函数的网络服务器。
J Cheminform. 2021 Feb 4;13(1):6. doi: 10.1186/s13321-021-00486-3.
6
Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening?准确性还是新颖性:在虚拟筛选中,基于目标的机器学习打分函数能为我们带来什么?
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa410.
7
Selecting machine-learning scoring functions for structure-based virtual screening.基于结构的虚拟筛选中机器学习打分函数的选择。
Drug Discov Today Technol. 2019 Dec;32-33:81-87. doi: 10.1016/j.ddtec.2020.09.001. Epub 2020 Sep 19.
8
Extended connectivity interaction features: improving binding affinity prediction through chemical description.扩展连接相互作用特征:通过化学描述提高结合亲和力预测。
Bioinformatics. 2021 Jun 16;37(10):1376-1382. doi: 10.1093/bioinformatics/btaa982.
9
Guiding Conventional Protein-Ligand Docking Software with Convolutional Neural Networks.用卷积神经网络指导传统蛋白质-配体对接软件
J Chem Inf Model. 2020 Oct 26;60(10):4594-4602. doi: 10.1021/acs.jcim.0c00542. Epub 2020 Oct 14.
10
Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.用于基于结构的药物设计的三维卷积神经网络和交叉对接数据集
J Chem Inf Model. 2020 Sep 28;60(9):4200-4215. doi: 10.1021/acs.jcim.0c00411. Epub 2020 Sep 10.