Li Junren, Fang Lei, Lou Jian-Guang
College of Chemistry and Molecular Engineering, Peking University, No. 5 Yiheyuan Road, Beijing, 100871, China.
Microsoft Research Asia, Building 2, No. 5 Dan Ling Street, Beijing, 100080, China.
J Cheminform. 2023 Jun 8;15(1):58. doi: 10.1186/s13321-023-00727-7.
Retrosynthesis is an important task in organic chemistry. Recently, numerous data-driven approaches have achieved promising results in this task. However, in practice, these data-driven methods might lead to sub-optimal outcomes by making predictions based on the training data distribution, a phenomenon we refer as frequency bias. For example, in template-based approaches, low-ranked predictions are typically generated by less common templates with low confidence scores which might be too low to be comparable, and it is observed that recorded reactants can be among these low-ranked predictions. In this work, we introduce RetroRanker, a ranking model built upon graph neural networks, designed to mitigate the frequency bias in predictions of existing retrosynthesis models through re-ranking. RetroRanker incorporates potential reaction changes of each set of predicted reactants in obtaining the given product to lower the rank of chemically unreasonable predictions. The predicted re-ranked results on publicly available retrosynthesis benchmarks demonstrate that we can achieve improvement on most state-of-the-art models with RetroRanker. Our preliminary studies also indicate that RetroRanker can enhance the performance of multi-step retrosynthesis.
逆合成是有机化学中的一项重要任务。近年来,众多数据驱动的方法在这项任务中取得了可喜的成果。然而,在实际应用中,这些数据驱动的方法可能会基于训练数据分布进行预测,从而导致次优结果,我们将这种现象称为频率偏差。例如,在基于模板的方法中,低排名的预测通常由不太常见的模板生成,其置信度得分较低,可能低到无法进行比较,并且观察到记录的反应物可能就在这些低排名的预测之中。在这项工作中,我们引入了RetroRanker,这是一种基于图神经网络构建的排序模型,旨在通过重新排序来减轻现有逆合成模型预测中的频率偏差。RetroRanker在获得给定产物时纳入每组预测反应物的潜在反应变化,以降低化学上不合理预测的排名。在公开可用的逆合成基准上的预测重新排序结果表明,使用RetroRanker我们可以在大多数最先进的模型上取得改进。我们的初步研究还表明,RetroRanker可以提高多步逆合成的性能。