挑战反应预测模型以推广至新化学领域。

Challenging Reaction Prediction Models to Generalize to Novel Chemistry.

作者信息

Bradshaw John, Zhang Anji, Mahjour Babak, Graff David E, Segler Marwin H S, Coley Connor W

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

出版信息

ACS Cent Sci. 2025 Mar 12;11(4):539-549. doi: 10.1021/acscentsci.5c00055. eCollection 2025 Apr 23.

DOI:10.1021/acscentsci.5c00055

PMID:40290152

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12022916/

Abstract

Deep learning models for anticipating the products of organic reactions have found many use cases, including validating retrosynthetic pathways and constraining synthesis-based molecular design tools. Despite compelling performance on popular benchmark tasks, strange and erroneous predictions sometimes ensue when using these models in practice. The core issue is that common benchmarks test models in an setting, whereas many real-world uses for these models are in settings and require a greater degree of extrapolation. To better understand how current reaction predictors work in out-of-distribution domains, we report a series of more challenging evaluations of a prototypical SMILES-based deep learning model. First, we illustrate how performance on randomly sampled data sets is overly optimistic compared to performance when generalizing to new patents or new authors. Second, we conduct time splits that evaluate how models perform when tested on reactions published years after those in their training set, mimicking real-world deployment. Finally, we consider extrapolation across reaction classes to reflect what would be required for the discovery of novel reaction types. This panel of tasks can reveal the capabilities and limitations of today's reaction predictors, acting as a crucial first step in the development of tomorrow's next-generation models capable of reaction discovery.

摘要

用于预测有机反应产物的深度学习模型已发现许多用例，包括验证逆合成途径和约束基于合成的分子设计工具。尽管在流行的基准任务上表现出色，但在实际使用这些模型时有时会出现奇怪且错误的预测。核心问题在于，常见基准在一种[此处原文缺失具体内容]设置中测试模型，而这些模型在许多实际应用中处于[此处原文缺失具体内容]设置，并且需要更大程度的外推。为了更好地理解当前反应预测器在分布外领域的工作方式，我们报告了一系列对基于原型SMILES的深度学习模型更具挑战性的评估。首先，我们说明了与推广到新专利或新作者时的性能相比，随机采样数据集上的性能如何过度乐观。其次，我们进行时间分割，以评估模型在对其训练集中反应数年之后发表的反应进行测试时的表现，模拟实际部署情况。最后，我们考虑跨反应类别进行外推，以反映发现新型反应类型所需的条件。这一系列任务可以揭示当今反应预测器的能力和局限性，作为开发能够发现反应的下一代模型的关键第一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e2f/12022916/8e1b4898dc6f/oc5c00055_0001.jpg

相似文献

Challenging Reaction Prediction Models to Generalize to Novel Chemistry.挑战反应预测模型以推广至新化学领域。

ACS Cent Sci. 2025 Mar 12;11(4):539-549. doi: 10.1021/acscentsci.5c00055. eCollection 2025 Apr 23.

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习：基于小规模化学反应数据集的逆向合成预测扩展到新的水平。

Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification.利用多尺度反应分类增强深度学习的逆合成反应预测

J Chem Inf Model. 2019 Feb 25;59(2):673-688. doi: 10.1021/acs.jcim.8b00801. Epub 2019 Feb 1.

Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model.利用数据增强和归一化预处理提高数据驱动模型化学反应预测的性能。

Polymers (Basel). 2023 May 8;15(9):2224. doi: 10.3390/polym15092224.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

Predicting Reaction Yields via Supervised Learning.通过有监督学习预测反应产率。

Acc Chem Res. 2021 Apr 20;54(8):1856-1865. doi: 10.1021/acs.accounts.0c00770. Epub 2021 Mar 31.

[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范（2023年）]

Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.

Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation.通过深度堆叠变换将深度学习用于医学图像分割推广到未见领域。

IEEE Trans Med Imaging. 2020 Jul;39(7):2531-2540. doi: 10.1109/TMI.2020.2973595. Epub 2020 Feb 12.

引用本文的文献

Electron flow matching for generative reaction mechanism prediction.用于生成反应机理预测的电子流匹配

Nature. 2025 Aug 20. doi: 10.1038/s41586-025-09426-9.

Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。

J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.

本文引用的文献

ORDerly: Data Sets and Benchmarks for Chemical Reaction Data.有序数据集和化学反应数据基准

J Chem Inf Model. 2024 May 13;64(9):3790-3798. doi: 10.1021/acs.jcim.4c00292. Epub 2024 Apr 22.

Real-World Molecular Out-Of-Distribution: Specification and Investigation.真实世界的分子离群值：规范与研究。

J Chem Inf Model. 2024 Feb 12;64(3):697-711. doi: 10.1021/acs.jcim.3c01774. Epub 2024 Feb 1.

SIMPD: an algorithm for generating simulated time splits for validating machine learning approaches.SIMPD：一种用于生成模拟时间分割以验证机器学习方法的算法。

J Cheminform. 2023 Dec 11;15(1):119. doi: 10.1186/s13321-023-00787-9.

Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets.化学语言模型对分布外数据集的快速定制

Chem Mater. 2023 Oct 27;35(21):8806-8815. doi: 10.1021/acs.chemmater.3c01406. eCollection 2023 Nov 14.

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing.基于端到端图生成架构的分子图编辑回溯合成预测。

Nat Commun. 2023 May 25;14(1):3009. doi: 10.1038/s41467-023-38851-5.

Reagent prediction with a molecular transformer improves reaction data quality.使用分子变换器进行试剂预测可提高反应数据质量。

Chem Sci. 2023 Mar 1;14(12):3235-3246. doi: 10.1039/d2sc06798f. eCollection 2023 Mar 22.

Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios.Transformer 模型在化学反应预测中的性能表现：不同预测和评估场景的分析。

J Chem Inf Model. 2023 Apr 10;63(7):1914-1924. doi: 10.1021/acs.jcim.2c01407. Epub 2023 Mar 23.

Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery.预测化学：用于反应部署、反应开发和反应发现的机器学习

Chem Sci. 2022 Nov 28;14(2):226-244. doi: 10.1039/d2sc05089g. eCollection 2023 Jan 4.

Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets.使用流行数据集预测结合亲和力的机器学习模型中的潜在偏差

ACS Omega. 2023 Jan 5;8(2):2389-2397. doi: 10.1021/acsomega.2c06781. eCollection 2023 Jan 17.

ChemistGA: A Chemical Synthesizable Accessible Molecular Generation Algorithm for Real-World Drug Discovery.化学家 GA：一种可用于实际药物发现的化学可合成的可及分子生成算法。

J Med Chem. 2022 Sep 22;65(18):12482-12496. doi: 10.1021/acs.jmedchem.2c01179. Epub 2022 Sep 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

挑战反应预测模型以推广至新化学领域。

Challenging Reaction Prediction Models to Generalize to Novel Chemistry.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献