Suppr超能文献

用于杂环逆合成的迁移学习

Transfer Learning for Heterocycle Retrosynthesis.

作者信息

Wieczorek Ewa, Sin Joshua W, Tanovic Sara, Holland Matthew T O, Wilbraham Liam, Sebastián-Pérez Victor, Bradley Anthony, Miketa Dominik, Brennan Paul E, Duarte Fernanda

机构信息

Chemistry Research Laboratory, 12 Mansfield Road, Oxford OX1 3TA, U.K.

Alzheimer's Research UK Oxford Drug Discovery Institute, Centre for Artificial Intelligence in Precision Medicine, Centre for Medicines Discovery, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7FZ, U.K.

出版信息

J Chem Inf Model. 2025 Aug 11;65(15):7851-7861. doi: 10.1021/acs.jcim.4c02041. Epub 2025 Jul 29.

Abstract

Heterocycles are important scaffolds in medicinal chemistry that can be used to modulate the binding mode as well as the pharmacokinetic properties of drugs. The importance of heterocycles has been exemplified by the publication of numerous data sets containing heterocyclic rings and their properties. However, those data sets lack synthetic routes toward the published heterocycles. Consequently, novel and uncommon heterocycles are not easily synthetically accessible. While retrosynthetic prediction models could usually be used to assist synthetic chemists, their performance is poor for heterocycle formation reactions due to low data availability. In this work, we compare the use of four different transfer learning methods to overcome the low data availability problem and improve the performance of retrosynthesis prediction models for ring-breaking disconnections. The mixed fine-tuned model achieves top-1 accuracy of 36.5%, and, moreover, 62.1% of its predictions are chemically valid and ring-breaking. Furthermore, we demonstrate the applicability of the mixed fine-tuned model in drug discovery by recreating synthetic routes toward two drug-like targets published in 2023. Finally, we introduce a method for further fine-tuning the model as new reaction data becomes available.

摘要

杂环是药物化学中的重要骨架,可用于调节药物的结合模式以及药代动力学性质。众多包含杂环及其性质的数据集的发布,例证了杂环的重要性。然而,这些数据集缺乏通往已发表杂环的合成路线。因此,新型且不常见的杂环不易通过合成获得。虽然逆合成预测模型通常可用于协助合成化学家,但由于数据可用性低,它们在杂环形成反应中的性能较差。在这项工作中,我们比较了四种不同迁移学习方法的使用,以克服数据可用性低的问题,并提高用于断环断开的逆合成预测模型的性能。混合微调模型实现了36.5%的top-1准确率,此外,其预测中有62.1%在化学上是有效的且是断环的。此外,我们通过重现通往2023年发表的两个类药靶点的合成路线,证明了混合微调模型在药物发现中的适用性。最后,我们介绍了一种在有新反应数据可用时进一步微调模型的方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验