Sagawa Tatsuya, Kojima Ryosuke
Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan.
RIKEN BDR, Kobe, 650-0047, Japan.
J Cheminform. 2025 Aug 19;17(1):126. doi: 10.1186/s13321-025-01075-4.
Accurate chemical reaction prediction is critical for reducing both cost and time in drug development. This study introduces ReactionT5, a transformer-based chemical reaction foundation model pre-trained on the Open Reaction Database-a large publicly available reaction dataset. In benchmarks for product prediction, retrosynthesis, and yield prediction, ReactionT5 outperformed existing models. Specifically, ReactionT5 achieved 97.5% accuracy in product prediction, 71.0% in retrosynthesis, and a coefficient of determination of 0.947 in yield prediction. Remarkably, ReactionT5, when fine-tuned with only a limited dataset of reactions, achieved performance on par with models fine-tuned on the complete dataset. Additionally, the visualization of ReactionT5 embeddings illustrates that the model successfully captures and represents the chemical reaction space, indicating effective learning of reaction properties.
准确的化学反应预测对于降低药物开发的成本和时间至关重要。本研究介绍了ReactionT5,这是一种基于Transformer的化学反应基础模型,在开放反应数据库(一个大型公开可用的反应数据集)上进行了预训练。在产品预测、逆合成和产率预测的基准测试中,ReactionT5的表现优于现有模型。具体而言,ReactionT5在产品预测中的准确率达到97.5%,在逆合成中达到71.0%,在产率预测中的决定系数为0.947。值得注意的是,ReactionT5仅使用有限的反应数据集进行微调时,其性能与在完整数据集上微调的模型相当。此外,ReactionT5嵌入的可视化表明该模型成功捕获并表示了化学反应空间,表明其有效地学习了反应特性。