Maziarz Krzysztof, Tripp Austin, Liu Guoqing, Stanley Megan, Xie Shufang, Gaiński Piotr, Seidl Philipp, Segler Marwin H S
Microsoft Research AI for Science.
Faraday Discuss. 2025 Jan 14;256(0):568-586. doi: 10.1039/d4fd00093e.
Automated synthesis planning has recently re-emerged as a research area at the intersection of chemistry and machine learning. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques, and unnecessarily hamper progress. To remedy this, we present a synthesis planning library with an extensive benchmarking framework, called SYNTHESEUS, which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step synthesis planning algorithms. We demonstrate the capabilities of SYNTHESEUS by re-evaluating several previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes in controlled evaluation experiments. We end with guidance for future works in this area, and call on the community to engage in the discussion on how to improve benchmarks for synthesis planning.
自动化合成规划最近作为化学与机器学习交叉领域的一个研究方向再度兴起。尽管表面上取得了稳步进展,但我们认为不完善的基准测试和不一致的比较掩盖了现有技术的系统性缺陷,并且不必要地阻碍了进步。为了弥补这一点,我们提出了一个带有广泛基准测试框架的合成规划库,称为SYNTHESEUS,它默认推广最佳实践,能够对单步和多步合成规划算法进行一致且有意义的评估。我们通过重新评估先前的几种逆合成算法来展示SYNTHESEUS的能力,并发现在受控评估实验中,最先进模型的排名发生了变化。我们最后为该领域的未来工作提供了指导,并呼吁社区参与关于如何改进合成规划基准测试的讨论。