将机器学习应用于基于配体的虚拟筛选中的超快形状识别。

Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening.

作者信息

Bonanno Etienne, Ebejer Jean-Paul

机构信息

Department of Artificial Intelligence, University of Malta, Msida, Malta.

Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta.

出版信息

Front Pharmacol. 2020 Feb 19;10:1675. doi: 10.3389/fphar.2019.01675. eCollection 2019.

DOI:10.3389/fphar.2019.01675

PMID:32140104

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7042174/

Abstract

Ultrafast Shape Recognition (USR), along with its derivatives, are Ligand-Based Virtual Screening (LBVS) methods that condense 3-dimensional information about molecular shape, as well as other properties, into a small set of numeric descriptors. These can be used to efficiently compute a measure of similarity between pairs of molecules using a simple inverse Manhattan Distance metric. In this study we explore the use of suitable Machine Learning techniques that can be trained using USR descriptors, so as to improve the similarity detection of potential new leads. We use molecules from the Directory for Useful Decoys-Enhanced to construct machine learning models based on three different algorithms: Gaussian Mixture Models (GMMs), Isolation Forests and Artificial Neural Networks (ANNs). We train models based on full molecule conformer models, as well as the Lowest Energy Conformations (LECs) only. We also investigate the performance of our models when trained on smaller datasets so as to model virtual screening scenarios when only a small number of actives are known . Our results indicate significant performance gains over a state of the art USR-derived method, ElectroShape 5D, with GMMs obtaining a mean performance up to 430% better than that of ElectroShape 5D in terms of Enrichment Factor with a maximum improvement of up to 940%. Additionally, we demonstrate that our models are capable of maintaining their performance, in terms of enrichment factor, within 10% of the mean as the size of the training dataset is successively reduced. Furthermore, we also demonstrate that running times for retrospective screening using the machine learning models we selected are faster than standard USR, on average by a factor of 10, including the time required for training. Our results show that machine learning techniques can significantly improve the virtual screening performance and efficiency of the USR family of methods.

摘要

超快形状识别（USR）及其衍生方法是基于配体的虚拟筛选（LBVS）方法，可将有关分子形状以及其他性质的三维信息浓缩为一小组数值描述符。这些描述符可用于使用简单的逆曼哈顿距离度量有效地计算分子对之间的相似性度量。在本研究中，我们探索使用可通过USR描述符进行训练的合适机器学习技术，以提高对潜在新先导物的相似性检测。我们使用来自有用诱饵增强目录的分子，基于三种不同算法构建机器学习模型：高斯混合模型（GMM）、孤立森林和人工神经网络（ANN）。我们基于完整分子构象模型以及仅最低能量构象（LEC）来训练模型。我们还研究了在较小数据集上训练模型时的性能，以便在仅知道少量活性化合物的情况下对虚拟筛选场景进行建模。我们的结果表明，与一种先进的基于USR的方法ElectroShape 5D相比，性能有显著提升，GMM在富集因子方面的平均性能比ElectroShape 5D高出430%，最大提升高达940%。此外，我们证明，随着训练数据集大小的相继减少，我们的模型在富集因子方面能够将其性能保持在均值的10%以内。此外，我们还证明，使用我们选择的机器学习模型进行回顾性筛选的运行时间比标准USR更快，平均快10倍，包括训练所需的时间。我们的结果表明，机器学习技术可以显著提高USR系列方法的虚拟筛选性能和效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05c7/7042174/2bc3912215b4/fphar-10-01675-g001.jpg

相似文献

Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening.将机器学习应用于基于配体的虚拟筛选中的超快形状识别。

Front Pharmacol. 2020 Feb 19;10:1675. doi: 10.3389/fphar.2019.01675. eCollection 2019.

ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics.ElectroShape：快速的分子相似性计算，包含形状、手性和静电。

J Comput Aided Mol Des. 2010 Sep;24(9):789-801. doi: 10.1007/s10822-010-9374-0. Epub 2010 Jul 8.

Improving the accuracy of ultrafast ligand-based screening: incorporating lipophilicity into ElectroShape as an extra dimension.提高超快基于配体的筛选的准确性：将亲脂性作为额外维度纳入 ElectroShape。

J Comput Aided Mol Des. 2011 Aug;25(8):785-90. doi: 10.1007/s10822-011-9463-8. Epub 2011 Aug 6.

Ligity: A Non-Superpositional, Knowledge-Based Approach to Virtual Screening. Ligity：一种非叠加的、基于知识的虚拟筛选方法。

J Chem Inf Model. 2019 Jun 24;59(6):2600-2616. doi: 10.1021/acs.jcim.8b00779. Epub 2019 Jun 4.

Ultrafast shape recognition: evaluating a new ligand-based virtual screening technology.超快速形状识别：评估一种基于配体的新虚拟筛选技术。

J Mol Graph Model. 2009 Apr;27(7):836-45. doi: 10.1016/j.jmgm.2009.01.001. Epub 2009 Jan 14.

Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning.利用基于分子构象的等价图神经网络和深度多重实例学习进行配体虚拟筛选。

Molecules. 2023 Aug 9;28(16):5982. doi: 10.3390/molecules28165982.

USR-VS: a web server for large-scale prospective virtual screening using ultrafast shape recognition techniques.USR-VS：一个使用超快速形状识别技术进行大规模前瞻性虚拟筛选的网络服务器。

Nucleic Acids Res. 2016 Jul 8;44(W1):W436-41. doi: 10.1093/nar/gkw320. Epub 2016 Apr 22.

USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints.USRCAT：基于药效团约束的实时超快速形状识别。

J Cheminform. 2012 Nov 6;4(1):27. doi: 10.1186/1758-2946-4-27.

Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening.DUD-E 数据集的隐藏偏差导致基于结构的虚拟筛选中深度学习的性能产生误导。

PLoS One. 2019 Aug 20;14(8):e0220113. doi: 10.1371/journal.pone.0220113. eCollection 2019.

Application of 3D Zernike descriptors to shape-based ligand similarity searching.三维 Zernike 描述子在基于形状的配体相似性搜索中的应用。

J Cheminform. 2009 Dec 17;1:19. doi: 10.1186/1758-2946-1-19.

引用本文的文献

Hypershape Recognition: A General Framework for Moment-Based Molecular Similarity.超形状识别：基于矩的分子相似性通用框架

J Chem Inf Model. 2025 Jun 23;65(12):5960-5972. doi: 10.1021/acs.jcim.5c00555. Epub 2025 Jun 11.

New strategies to enhance the efficiency and precision of drug discovery.提高药物研发效率和精准度的新策略。

Front Pharmacol. 2025 Feb 11;16:1550158. doi: 10.3389/fphar.2025.1550158. eCollection 2025.

Benzoyl Valine Quasiracemates: Pairing CF Quasienantiomers with H to -Butyl.苯甲酰缬氨酸准外消旋体：将CF准对映体与H对叔丁基配对

Cryst Growth Des. 2024 Apr 15;24(9):3967-3976. doi: 10.1021/acs.cgd.4c00307. eCollection 2024 May 1.

Computational Chemistry for the Identification of Lead Compounds for Radiotracer Development.用于放射性示踪剂开发的先导化合物鉴定的计算化学

Pharmaceuticals (Basel). 2023 Feb 18;16(2):317. doi: 10.3390/ph16020317.

A multi-reference poly-conformational method for design, optimization, and repositioning of pharmaceutical compounds illustrated for selected SARS-CoV-2 ligands.多参照多构象方法用于药物化合物的设计、优化和再定位，文中选用了一些 SARS-CoV-2 配体作为案例进行说明。

PeerJ. 2022 Nov 24;10:e14252. doi: 10.7717/peerj.14252. eCollection 2022.

Novel Efficient Multistage Lead Optimization Pipeline Experimentally Validated for DYRK1B Selective Inhibitors.新型高效多阶段先导化合物优化管道经实验验证可用于 DYRK1B 选择性抑制剂。

J Med Chem. 2022 Oct 27;65(20):13784-13792. doi: 10.1021/acs.jmedchem.2c00988. Epub 2022 Oct 14.

The Main Protease of SARS-CoV-2 as a Target for Phytochemicals against Coronavirus.新型冠状病毒主要蛋白酶作为植物化学物质抗冠状病毒的靶点

Plants (Basel). 2022 Jul 17;11(14):1862. doi: 10.3390/plants11141862.

Rapid Identification of Potential Drug Candidates from Multi-Million Compounds' Repositories. Combination of 2D Similarity Search with 3D Ligand/Structure Based Methods and In Vitro Screening.从数百万化合物库中快速鉴定潜在药物候选物。二维相似性搜索与三维配体/结构的方法相结合，并进行体外筛选。

Molecules. 2021 Sep 15;26(18):5593. doi: 10.3390/molecules26185593.

Applications of Virtual Screening in Bioprospecting: Facts, Shifts, and Perspectives to Explore the Chemo-Structural Diversity of Natural Products.虚拟筛选在生物勘探中的应用：探索天然产物化学结构多样性的事实、转变与展望

Front Chem. 2021 Apr 29;9:662688. doi: 10.3389/fchem.2021.662688. eCollection 2021.

本文引用的文献

Performance of machine-learning scoring functions in structure-based virtual screening.基于结构的虚拟筛选中机器学习评分函数的性能

Sci Rep. 2017 Apr 25;7:46710. doi: 10.1038/srep46710.

Probabilistic Modeling of Conformational Space for 3D Machine Learning Approaches.用于3D机器学习方法的构象空间概率建模

Mol Inform. 2010 May 17;29(5):441-55. doi: 10.1002/minf.201000036.

Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening.用于改进基于结构的结合亲和力预测和虚拟筛选的机器学习评分函数。

Wiley Interdiscip Rev Comput Mol Sci. 2015 Nov-Dec;5(6):405-424. doi: 10.1002/wcms.1225. Epub 2015 Aug 28.

Innovation in the pharmaceutical industry: New estimates of R&D costs.制药行业的创新：研发成本的新估计

J Health Econ. 2016 May;47:20-33. doi: 10.1016/j.jhealeco.2016.01.012. Epub 2016 Feb 12.

Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation.更明智的距离几何：利用我们所了解的信息来改进构象生成。

J Chem Inf Model. 2015 Dec 28;55(12):2562-74. doi: 10.1021/acs.jcim.5b00654. Epub 2015 Nov 30.

UFSRAT: Ultra-fast Shape Recognition with Atom Types--the discovery of novel bioactive small molecular scaffolds for FKBP12 and 11βHSD1.UFSRAT：基于原子类型的超快速形状识别——用于FKBP12和11βHSD1的新型生物活性小分子支架的发现

PLoS One. 2015 Feb 6;10(2):e0116570. doi: 10.1371/journal.pone.0116570. eCollection 2015.

Machine-learning approaches in drug discovery: methods and applications.药物发现中的机器学习方法：方法与应用。

Drug Discov Today. 2015 Mar;20(3):318-31. doi: 10.1016/j.drudis.2014.10.012. Epub 2014 Nov 4.

Virtual screening strategies in drug discovery: a critical review.虚拟筛选策略在药物发现中的应用：批判性评价。

Curr Med Chem. 2013;20(23):2839-60. doi: 10.2174/09298673113209990001.

USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints.USRCAT：基于药效团约束的实时超快速形状识别。

J Cheminform. 2012 Nov 6;4(1):27. doi: 10.1186/1758-2946-4-27.

Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking.有用诱饵目录增强版（DUD-E）：更好的配体和诱饵，用于更好的基准测试。

J Med Chem. 2012 Jul 26;55(14):6582-94. doi: 10.1021/jm300687e. Epub 2012 Jul 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将机器学习应用于基于配体的虚拟筛选中的超快形状识别。

Applying Machine Learning to Ultrafast Shape Recognition in Ligand-Based Virtual Screening.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献