Suppr超能文献

基于分子子结构指纹识别潜在断键点的单步回溯合成预测。

Single-Step Retrosynthesis Prediction Based on the Identification of Potential Disconnection Sites Using Molecular Substructure Fingerprints.

机构信息

Department of Computer Science, School of Computing, Tokyo Institute of Technology, W8-85, 2-12-1, Ookayama, Meguro 152-8552, Tokyo, Japan.

出版信息

J Chem Inf Model. 2021 Feb 22;61(2):641-652. doi: 10.1021/acs.jcim.0c01100. Epub 2021 Feb 3.

Abstract

The proper application of retrosynthesis to identify possible transformations for a given target compound requires a lot of chemistry knowledge and experience. However, because the complexity of this technique scales together with the complexity of the target, efficient application on compounds with intricate molecular structures becomes almost impossible for human chemists. The idea of using computers in such situations has existed for a long time, but the accuracy was not sufficient for practical applications. Nevertheless, with the steady improvement of machine learning and artificial intelligence in recent years, computer-assisted retrosynthesis has been gaining research attention again. Because of the overall lack of chemical reaction data, the main challenge for the recent retrosynthesis methods is low exploration ability during the analysis of target and intermediate compounds. The main goal of this research is to develop a novel, template-free approach to address this issue. Only individual molecular substructures of the target are used to determine potential disconnection sites, without relying on additional information such as chemical reaction class. The model for the identification of potential disconnection sites is trained on novel molecular substructure fingerprint representations. For each of the disconnections suggested using the model, a simple structural similarity-based reactant retrieval and scoring method is applied, and the suggestions are completed. This method achieves 47.2% top-1 accuracy for the single-step retrosynthesis task on the processed United States Patent Office dataset. Furthermore, if the predicted reaction class is used to narrow down the reactant candidate search space, the performance is improved to 61.4% top-1 accuracy.

摘要

将逆合成分析应用于确定给定目标化合物可能的转化,需要大量的化学知识和经验。然而,由于该技术的复杂性与目标的复杂性成正比,对于具有复杂分子结构的化合物,人类化学家几乎不可能有效地应用它。在这种情况下使用计算机的想法已经存在了很长时间,但由于准确性不足,无法应用于实际应用。尽管如此,近年来机器学习和人工智能的稳步发展,使得计算机辅助逆合成又重新引起了研究关注。由于缺乏整体化学反应数据,最近的逆合成方法的主要挑战是在分析目标和中间化合物时探索能力低。这项研究的主要目标是开发一种新的、无模板的方法来解决这个问题。仅使用目标的个别分子子结构来确定潜在的断键位置,而不依赖于化学反应类等其他信息。用于识别潜在断键位置的模型是在新的分子子结构指纹表示上进行训练的。对于使用模型建议的每个断键,应用一种简单的基于结构相似性的反应物检索和评分方法,并完成建议。该方法在处理后的美国专利局数据集上单步逆合成任务中的准确率达到 47.2%。此外,如果使用预测的反应类来缩小反应物候选搜索空间,则性能可提高到 61.4%的准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验