• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SCORCH:利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。

SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.

机构信息

Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.

Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.

出版信息

J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.

DOI:10.1016/j.jare.2022.07.001
PMID:35901959
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10105235/
Abstract

INTRODUCTION

The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions.

OBJECTIVES

To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening.

METHODS

A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance.

RESULTS

We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1 % enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets.

CONCLUSION

By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.

摘要

简介

发现一种新药是一项昂贵且漫长的工作。如果预测足够快速和准确,那么计算预测哪些小分子可以与蛋白质靶标结合,可以加速这一过程。最近的机器学习打分函数重新评估分子对接的输出,以实现更准确的预测。然而,以前的打分函数是在晶体蛋白-配体复合物和诱饵数据集上进行训练的。晶体结构的有限可用性和诱饵数据集的偏差会降低打分函数的性能。

目的

解决以前打分函数的关键限制问题,从而提高基于结构的虚拟筛选的预测性能。

方法

创建了一种新的机器学习打分函数,名为 SCORCH(基于 RMSD 的命中分类的打分共识)。为了开发 SCORCH,通过考虑多个配体构象并根据它们与天然构象的 RMSD 来标记构象,来扩充训练数据。通过为每个配体生成具有匹配属性的诱饵,并使用相同的方法来准备和对接诱饵和配体,来解决诱饵偏差问题。还使用 3 种不同的机器学习方法的共识来提高性能。

结果

我们发现,SCORCH 中的多构象扩充提高了其在独立基准数据集上的对接能力和筛选能力。SCORCH 的富集因子(EF)为 13.78,比在 18 个 DEKOIS 2.0 靶标上训练的等效打分函数高 1%,在 CSAR 2014 上的天然构象平均排名为 5.9,而不是 30.4。此外,SCORCH 在独立基准数据集上的虚拟筛选和构象预测方面优于广泛使用的打分函数。

结论

通过合理解决以前打分函数的关键限制,SCORCH 提高了虚拟筛选的性能。SCORCH 还提供了其不确定性的估计,这有助于降低药物发现的成本和时间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f30f/10105235/fd732b5432dc/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f30f/10105235/fd732b5432dc/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f30f/10105235/fd732b5432dc/ga1.jpg

相似文献

1
SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation.SCORCH:利用机器学习分类器、数据增强和不确定性估计改进基于结构的虚拟筛选。
J Adv Res. 2023 Apr;46:135-147. doi: 10.1016/j.jare.2022.07.001. Epub 2022 Jul 25.
2
Boosted neural networks scoring functions for accurate ligand docking and ranking.用于精确配体对接和排序的增强神经网络评分函数。
J Bioinform Comput Biol. 2018 Apr;16(2):1850004. doi: 10.1142/S021972001850004X. Epub 2018 Feb 4.
3
Target-specific native/decoy pose classifier improves the accuracy of ligand ranking in the CSAR 2013 benchmark.靶点特异性天然/诱饵构象分类器提高了CSAR 2013基准测试中配体排名的准确性。
J Chem Inf Model. 2015 Jan 26;55(1):63-71. doi: 10.1021/ci500519w. Epub 2014 Dec 18.
4
Task-Specific Scoring Functions for Predicting Ligand Binding Poses and Affinity and for Screening Enrichment.用于预测配体结合构象和亲和力以及进行筛选富集的任务特定评分函数。
J Chem Inf Model. 2018 Jan 22;58(1):119-133. doi: 10.1021/acs.jcim.7b00309. Epub 2017 Dec 20.
5
Docking and Scoring with Target-Specific Pose Classifier Succeeds in Native-Like Pose Identification But Not Binding Affinity Prediction in the CSAR 2014 Benchmark Exercise. docking 和 scoring 与目标特定的 pose 分类器相结合,成功地实现了类似天然构象的 pose 识别,但在 CSAR 2014 基准测试中不能预测结合亲和力。
J Chem Inf Model. 2016 Jun 27;56(6):1032-41. doi: 10.1021/acs.jcim.5b00751. Epub 2016 Apr 20.
6
DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures.DeepBSP:一种用于准确预测蛋白质-配体对接结构的机器学习方法。
J Chem Inf Model. 2021 May 24;61(5):2231-2240. doi: 10.1021/acs.jcim.1c00334. Epub 2021 May 12.
7
Cheminformatics meets molecular mechanics: a combined application of knowledge-based pose scoring and physical force field-based hit scoring functions improves the accuracy of structure-based virtual screening. cheminformatics 与分子力学相结合:基于知识的构象评分和基于物理力场的命中评分函数的联合应用提高了基于结构的虚拟筛选的准确性。
J Chem Inf Model. 2012 Jan 23;52(1):16-28. doi: 10.1021/ci2002507. Epub 2011 Dec 14.
8
MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery.MILCDock:用于药物发现虚拟筛选的机器学习增强共识对接。
J Chem Inf Model. 2022 Nov 28;62(22):5342-5350. doi: 10.1021/acs.jcim.2c00705. Epub 2022 Nov 7.
9
Machine learning in computational docking.计算对接中的机器学习。
Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.
10
Protein-Ligand Docking in the Machine-Learning Era.蛋白质-配体对接在机器学习时代。
Molecules. 2022 Jul 18;27(14):4568. doi: 10.3390/molecules27144568.

引用本文的文献

1
CACHE Challenge #2: Targeting the RNA Site of the SARS-CoV-2 Helicase Nsp13.CACHE挑战#2:靶向严重急性呼吸综合征冠状病毒2解旋酶Nsp13的RNA位点。
J Chem Inf Model. 2025 Jul 14;65(13):6884-6898. doi: 10.1021/acs.jcim.5c00535. Epub 2025 Jun 20.
2
SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3.用于METTL3精确可靠蛋白质-配体相互作用建模的基于SPLIF增强注意力驱动的3D卷积神经网络
ACS Omega. 2025 Apr 16;10(16):16748-16761. doi: 10.1021/acsomega.5c00538. eCollection 2025 Apr 29.
3
CPHNet: a novel pipeline for anti-HAPE drug screening via deep learning-based Cell Painting scoring.

本文引用的文献

1
Deep Neural Networks and Tabular Data: A Survey.深度神经网络与表格数据:一项综述。
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7499-7519. doi: 10.1109/TNNLS.2022.3229161. Epub 2024 Jun 3.
2
MEMES: Machine learning framework for Enhanced MolEcular Screening.MEMES:用于增强分子筛选的机器学习框架。
Chem Sci. 2021 Jul 26;12(35):11710-11721. doi: 10.1039/d1sc02783b. eCollection 2021 Sep 15.
3
Machine-learning methods for ligand-protein molecular docking.基于机器学习的配体-蛋白分子对接方法。
CPHNet:一种通过基于深度学习的细胞绘画评分进行抗高原肺水肿药物筛选的新型流程。
Respir Res. 2025 Mar 8;26(1):91. doi: 10.1186/s12931-025-03173-1.
4
New strategies to enhance the efficiency and precision of drug discovery.提高药物研发效率和精准度的新策略。
Front Pharmacol. 2025 Feb 11;16:1550158. doi: 10.3389/fphar.2025.1550158. eCollection 2025.
5
RankMHC: Learning to Rank Class-I Peptide-MHC Structural Models.RankMHC:学习对I类肽-主要组织相容性复合体结构模型进行排序。
J Chem Inf Model. 2024 Dec 9;64(23):8729-8742. doi: 10.1021/acs.jcim.4c01278. Epub 2024 Nov 18.
6
Protein language models are performant in structure-free virtual screening.蛋白质语言模型在无结构虚拟筛选中表现出色。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae480.
7
Exploring Tau Fibril-Disaggregating and Antioxidating Molecules Binding to Membrane-Bound Amyloid Oligomers Using Machine Learning-Enhanced Docking and Molecular Dynamics.利用机器学习增强对接和分子动力学探索与膜结合淀粉样寡聚体结合的 Tau 纤维分解和抗氧化分子。
Molecules. 2024 Jun 13;29(12):2818. doi: 10.3390/molecules29122818.
8
Geometry Optimization Algorithms in Conjunction with the Machine Learning Potential ANI-2x Facilitate the Structure-Based Virtual Screening and Binding Mode Prediction.几何优化算法与机器学习势能ANI-2x 相结合,有助于基于结构的虚拟筛选和结合模式预测。
Biomolecules. 2024 May 31;14(6):648. doi: 10.3390/biom14060648.
9
Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors.综合机器学习助力基于结构的PARP1抑制剂虚拟筛选。
J Cheminform. 2024 Apr 7;16(1):40. doi: 10.1186/s13321-024-00832-1.
10
Machine-Learning- and Structure-Based Virtual Screening for Selecting Cinnamic Acid Derivatives as DHFR-TS Inhibitors.基于机器学习和结构的虚拟筛选,选择肉桂酸衍生物作为 DHFR-TS 抑制剂。
Molecules. 2023 Dec 28;29(1):179. doi: 10.3390/molecules29010179.
Drug Discov Today. 2022 Jan;27(1):151-164. doi: 10.1016/j.drudis.2021.09.007. Epub 2021 Sep 21.
4
Recent progress on the prospective application of machine learning to structure-based virtual screening.基于结构的虚拟筛选中机器学习的前瞻性应用的最新进展。
Curr Opin Chem Biol. 2021 Dec;65:28-34. doi: 10.1016/j.cbpa.2021.04.009. Epub 2021 May 27.
5
Generating property-matched decoy molecules using deep learning.利用深度学习生成性质匹配的诱饵分子。
Bioinformatics. 2021 Aug 9;37(15):2134-2141. doi: 10.1093/bioinformatics/btab080.
6
spyrmsd: symmetry-corrected RMSD calculations in Python.spyrmsd:Python中经对称性校正的均方根偏差计算。
J Cheminform. 2020 Aug 31;12(1):49. doi: 10.1186/s13321-020-00455-2.
7
Selecting machine-learning scoring functions for structure-based virtual screening.基于结构的虚拟筛选中机器学习打分函数的选择。
Drug Discov Today Technol. 2019 Dec;32-33:81-87. doi: 10.1016/j.ddtec.2020.09.001. Epub 2020 Sep 19.
8
Extended connectivity interaction features: improving binding affinity prediction through chemical description.扩展连接相互作用特征:通过化学描述提高结合亲和力预测。
Bioinformatics. 2021 Jun 16;37(10):1376-1382. doi: 10.1093/bioinformatics/btaa982.
9
GWOVina: A grey wolf optimization approach to rigid and flexible receptor docking.GWOVina:一种用于刚性和柔性受体对接的灰狼优化算法。
Chem Biol Drug Des. 2021 Jan;97(1):97-110. doi: 10.1111/cbdd.13764. Epub 2020 Aug 10.
10
Machine learning classification can reduce false positives in structure-based virtual screening.机器学习分类可以减少基于结构的虚拟筛选中的假阳性。
Proc Natl Acad Sci U S A. 2020 Aug 4;117(31):18477-18488. doi: 10.1073/pnas.2000585117. Epub 2020 Jul 15.