Hartog Peter B R, Westerlund Annie M, Tetko Igor V, Genheden Samuel
Molecular AI, Discovery Sciences, R&D, AstraZeneca, Pepparedsleden 1, 431 83 Mölndal, Sweden.
Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich - German Research Center for Environmental Health (GmbH), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany.
J Chem Inf Model. 2025 Feb 24;65(4):1771-1781. doi: 10.1021/acs.jcim.4c01821. Epub 2025 Jan 31.
The efficiency of machine learning (ML) models is crucial to minimize inference times and reduce the carbon footprints of models deployed in production environments. Current models employed in retrosynthesis to generate a synthesis route from a target molecule to purchasable compounds are prohibitively slow. The model operates in a single-step fashion in a tree search algorithm by predicting reactant molecules given a product molecule as input. In this study, we investigate the ability of alternative transformer architectures, knowledge distillation (KD), and simple hyper-parameter optimization to decrease inference times of the Chemformer model. Initially, we assess the ability of closely related transformer architectures and conclude that these models under-performed when using KD. Additionally, we investigate the effects of feature-based and response-based KD together with hyper-parameters optimized based on inference sample time and model accuracy. We find that although reducing model size and improving single-step speed are important, our results indicate that multi-step search efficiency is more significantly influenced by the diversity and confidence of single-step models. Based on this work, further research should use KD in combination with other techniques, as multi-step speed continues to prevent proper integration of synthesis planning. However, in Monte Carlo-based (MC) multi-step retrosynthesis, other factors play a crucial role in balancing exploration and exploitation during the search process, often outweighing the direct impact of single-step model speed and carbon footprints.
机器学习(ML)模型的效率对于最大限度地减少推理时间和降低生产环境中部署模型的碳足迹至关重要。目前用于逆合成以从目标分子生成可购买化合物的合成路线的模型速度极慢。该模型在树搜索算法中以单步方式运行,通过将产物分子作为输入来预测反应物分子。在本研究中,我们研究了替代变压器架构、知识蒸馏(KD)和简单超参数优化降低Chemformer模型推理时间的能力。最初,我们评估了密切相关的变压器架构的能力,并得出结论,这些模型在使用KD时表现不佳。此外,我们研究了基于特征和基于响应的KD以及基于推理采样时间和模型准确性优化的超参数的影响。我们发现,虽然减小模型大小和提高单步速度很重要,但我们的结果表明,多步搜索效率受单步模型的多样性和置信度影响更大。基于这项工作,进一步的研究应将KD与其他技术结合使用,因为多步速度仍然阻碍了合成规划的适当整合。然而,在基于蒙特卡罗(MC)的多步逆合成中,其他因素在搜索过程中平衡探索和利用方面起着关键作用,其影响往往超过单步模型速度和碳足迹的直接影响。