Suppr超能文献

利用开放反应条件数据集和反应中心的无监督学习进行通用可解释反应条件预测。

Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center.

作者信息

Wang Xiaorui, Hsieh Chang-Yu, Yin Xiaodan, Wang Jike, Li Yuquan, Deng Yafeng, Jiang Dejun, Wu Zhenxing, Du Hongyan, Chen Hongming, Li Yun, Liu Huanxiang, Wang Yuwei, Luo Pei, Hou Tingjun, Yao Xiaojun

机构信息

Dr. Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, 999078, China.

CarbonSilicon AI Technology Co., Ltd, Hangzhou, Zhejiang310018, China.

出版信息

Research (Wash D C). 2023 Oct 16;6:0231. doi: 10.34133/research.0231. eCollection 2023.

Abstract

Effective synthesis planning powered by deep learning (DL) can significantly accelerate the discovery of new drugs and materials. However, most DL-assisted synthesis planning methods offer either none or very limited capability to recommend suitable reaction conditions (RCs) for their reaction predictions. Currently, the prediction of RCs with a DL framework is hindered by several factors, including: (a) lack of a standardized dataset for benchmarking, (b) lack of a general prediction model with powerful representation, and (c) lack of interpretability. To address these issues, we first created 2 standardized RC datasets covering a broad range of reaction classes and then proposed a powerful and interpretable Transformer-based RC predictor named Parrot. Through careful design of the model architecture, pretraining method, and training strategy, Parrot improved the overall top-3 prediction accuracy on catalysis, solvents, and other reagents by as much as 13.44%, compared to the best previous model on a newly curated dataset. Additionally, the mean absolute error of the predicted temperatures was reduced by about 4 °C. Furthermore, Parrot manifests strong generalization capacity with superior cross-chemical-space prediction accuracy. Attention analysis indicates that Parrot effectively captures crucial chemical information and exhibits a high level of interpretability in the prediction of RCs. The proposed model Parrot exemplifies how modern neural network architecture when appropriately pretrained can be versatile in making reliable, generalizable, and interpretable recommendation for RCs even when the underlying training dataset may still be limited in diversity.

摘要

由深度学习(DL)驱动的有效合成规划可以显著加速新药和新材料的发现。然而,大多数深度学习辅助的合成规划方法在为其反应预测推荐合适的反应条件(RCs)方面,要么完全没有能力,要么能力非常有限。目前,使用深度学习框架预测反应条件受到几个因素的阻碍,包括:(a)缺乏用于基准测试的标准化数据集,(b)缺乏具有强大表征能力的通用预测模型,以及(c)缺乏可解释性。为了解决这些问题,我们首先创建了2个涵盖广泛反应类别的标准化反应条件数据集,然后提出了一个名为Parrot的基于Transformer的强大且可解释的反应条件预测器。通过精心设计模型架构、预训练方法和训练策略,与新整理数据集上的最佳先前模型相比,Parrot在催化、溶剂和其他试剂方面的总体前3预测准确率提高了13.44%。此外,预测温度的平均绝对误差降低了约4°C。此外,Parrot表现出强大的泛化能力,具有卓越的跨化学空间预测准确率。注意力分析表明,Parrot有效地捕捉了关键化学信息,并在反应条件预测中表现出高度的可解释性。所提出的模型Parrot例证了,即使基础训练数据集在多样性方面可能仍然有限,现代神经网络架构在经过适当预训练后,如何能够在为反应条件做出可靠、可泛化和可解释的推荐方面具有通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbdf/10578430/19d6889e520f/research.0231.fig.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验