利用活性悬崖揭示分子机器学习的局限性。

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.

机构信息

Institute for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands.

Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands.

出版信息

J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.

DOI:10.1021/acs.jcim.2c01073

PMID:36456532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9749029/

Abstract

Machine learning has become a crucial tool in drug discovery and chemistry at large, , to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.

摘要

机器学习已经成为药物发现和整个化学领域的重要工具，可用于高精度地预测分子性质，如生物活性。然而，活性悬崖（结构高度相似但活性差异很大的分子对）对模型性能的影响受到的关注有限。这些边缘情况不仅对分子发现和优化具有启示意义，而且能够准确预测活性悬崖活性的模型也更有可能应用于前瞻性应用。我们的工作旨在填补当前关于存在活性悬崖时机器学习方法的最佳实践的知识空白。我们总共在 30 个大分子靶标上的生物活性数据上对 24 种机器和深度学习方法进行了基准测试，以评估它们对活性悬崖化合物的性能。虽然所有方法在活性悬崖存在的情况下都存在困难，但基于分子描述符的机器学习方法的性能优于更复杂的深度学习方法。我们的研究结果突出了性能的大案例差异，主张在模型开发和评估过程中纳入专门的“活性悬崖中心”指标，以及开发更好地预测活性悬崖性质的新算法。为此，这项研究的方法、指标和结果已经被封装到一个名为 MoleculeACE（Activity Cliff Estimation，可在 GitHub 上获得：https://github.com/molML/MoleculeACE）的开放访问基准测试平台中。MoleculeACE 的设计旨在引导社区解决由活性悬崖引起的分子机器学习模型的紧迫但被忽视的限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db5d/9749029/536e2122e255/ci2c01073_0002.jpg

相似文献

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。

J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.

Prediction of activity cliffs using support vector machines.使用支持向量机预测活性悬崖。

J Chem Inf Model. 2012 Sep 24;52(9):2354-65. doi: 10.1021/ci300306a. Epub 2012 Aug 23.

Prediction of individual compounds forming activity cliffs using emerging chemical patterns.利用新出现的化学模式预测形成活性断崖的单个化合物。

J Chem Inf Model. 2013 Dec 23;53(12):3131-9. doi: 10.1021/ci400597d. Epub 2013 Dec 10.

Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde?药物研发中的活性悬崖：是ekyll 博士还是 hyde 先生？

Drug Discov Today. 2014 Aug;19(8):1069-80. doi: 10.1016/j.drudis.2014.02.003. Epub 2014 Feb 20.

Extending the activity cliff concept: structural categorization of activity cliffs and systematic identification of different types of cliffs in the ChEMBL database.拓展活动悬崖概念：ChEMBL 数据库中活动悬崖的结构分类和不同类型悬崖的系统识别。

J Chem Inf Model. 2012 Jul 23;52(7):1806-11. doi: 10.1021/ci300274c. Epub 2012 Jul 12.

Searching for coordinated activity cliffs using particle swarm optimization.使用粒子群优化算法搜索协调活动悬崖。

J Chem Inf Model. 2012 Apr 23;52(4):927-34. doi: 10.1021/ci3000503. Epub 2012 Mar 29.

Method for the evaluation of structure-activity relationship information associated with coordinated activity cliffs.与协同作用悬崖相关的结构-活性关系信息的评估方法。

J Med Chem. 2014 Aug 14;57(15):6553-63. doi: 10.1021/jm500577n. Epub 2014 Jul 17.

MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs.MMP-Cliffs：基于匹配分子对的活性 cliffs 的系统识别。

J Chem Inf Model. 2012 May 25;52(5):1138-45. doi: 10.1021/ci3001138. Epub 2012 Apr 17.

Do medicinal chemists learn from activity cliffs? A systematic evaluation of cliff progression in evolving compound data sets.药物化学家能否从活性悬崖中吸取教训？对进化化合物数据集的悬崖进展进行系统评估。

J Med Chem. 2013 Apr 25;56(8):3339-45. doi: 10.1021/jm400147j. Epub 2013 Apr 9.

Molecular scaffolds with high propensity to form multi-target activity cliffs.具有高形成多靶标活性悬崖倾向的分子支架。

J Chem Inf Model. 2010 Apr 26;50(4):500-10. doi: 10.1021/ci100059q.

引用本文的文献

Going beyond SMILES enumeration for data augmentation in generative drug discovery.超越用于生成式药物发现中数据增强的SMILES枚举法。

Digit Discov. 2025 Aug 14. doi: 10.1039/d5dd00028a.

ACtriplet: An improved deep learning model for activity cliffs prediction by in tegrating triplet loss and pre-training.AC三元组：一种通过整合三元组损失和预训练来改进的用于活动悬崖预测的深度学习模型。

J Pharm Anal. 2025 Aug;15(8):101317. doi: 10.1016/j.jpha.2025.101317. Epub 2025 Apr 21.

Digital Alchemy: The Rise of Machine and Deep Learning in Small-Molecule Drug Discovery.数字炼金术：小分子药物发现中机器学习与深度学习的兴起

Int J Mol Sci. 2025 Jul 16;26(14):6807. doi: 10.3390/ijms26146807.

The topology of molecular representations and its influence on machine learning performance.分子表示的拓扑结构及其对机器学习性能的影响。

J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.

Undersampling techniques for non-linear chemical space visualization.用于非线性化学空间可视化的欠采样技术。

bioRxiv. 2025 Jul 7:2025.07.03.663077. doi: 10.1101/2025.07.03.663077.

Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。

J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.

ACES-GNN: can graph neural network learn to explain activity cliffs?ACES-GNN：图神经网络能学会解释活性断崖吗？

Digit Discov. 2025 Jun 30. doi: 10.1039/d5dd00012b.

Enhancing Drug-Target Interaction Prediction through Transfer Learning from Activity Cliff Prediction Tasks.通过从活性悬崖预测任务进行迁移学习来增强药物-靶点相互作用预测

J Chem Inf Model. 2025 Jul 14;65(13):6558-6567. doi: 10.1021/acs.jcim.5c00484. Epub 2025 Jun 30.

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery.TransMA：一种用于预测可电离脂质纳米颗粒在mRNA递送中性质的可解释多模态深度学习模型。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf307.

Data efficient molecular image representation learning using foundation models.使用基础模型进行数据高效的分子图像表示学习。

Chem Sci. 2025 May 22. doi: 10.1039/d5sc00907c.

本文引用的文献

Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design.加性和非加性对药物设计中机器学习和深度学习模型的影响

ACS Omega. 2022 Jul 19;7(30):26573-26581. doi: 10.1021/acsomega.2c02738. eCollection 2022 Aug 2.

Prediction Accuracy of Production ADMET Models as a Function of Version: Activity Cliffs Rule.生产 ADMET 模型版本预测准确性：活性悬崖规则。

J Chem Inf Model. 2022 Jul 25;62(14):3275-3280. doi: 10.1021/acs.jcim.2c00699. Epub 2022 Jul 7.

On the Frustration to Predict Binding Affinities from Protein-Ligand Structures with Deep Neural Networks.从蛋白质-配体结构用深度神经网络预测结合亲和力的挫折。

J Med Chem. 2022 Jun 9;65(11):7946-7958. doi: 10.1021/acs.jmedchem.2c00487. Epub 2022 May 24.

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models.基于困惑度的化学语言模型分子排序和偏差估计。

J Chem Inf Model. 2022 Mar 14;62(5):1199-1206. doi: 10.1021/acs.jcim.2c00079. Epub 2022 Feb 22.

Benchmarking Molecular Feature Attribution Methods with Activity Cliffs.基于活性悬崖的分子特征归因方法的基准测试。

J Chem Inf Model. 2022 Jan 24;62(2):274-283. doi: 10.1021/acs.jcim.1c01163. Epub 2022 Jan 12.

Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。

Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Nonadditivity in public and inhouse data: implications for drug design.公共数据与内部数据的非加和性：对药物设计的影响。

J Cheminform. 2021 Jul 2;13(1):47. doi: 10.1186/s13321-021-00525-z.

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models.图神经网络能否为药物发现学习更好的分子表示？基于描述符和基于图的模型的比较研究。

J Cheminform. 2021 Feb 17;13(1):12. doi: 10.1186/s13321-020-00479-8.

Introducing a new category of activity cliffs combining different compound similarity criteria.引入一种结合不同化合物相似性标准的新型活性悬崖类别。

RSC Med Chem. 2020 Jan 7;11(1):132-141. doi: 10.1039/c9md00463g. eCollection 2020 Jan 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用活性悬崖揭示分子机器学习的局限性。

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献