Cong Shan, Zhang Meng, Song Yu, Chang Sihao, Tian Jing, Zeng Hongji, Ji Hongchao
College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, P.R. China.
Qingdao Innovation and Development Center, Harbin Engineering University, Qingdao 266000, P.R. China.
Patterns (N Y). 2025 Apr 30;6(8):101259. doi: 10.1016/j.patter.2025.101259. eCollection 2025 Aug 8.
Natural products (NPs) play a vital role in drug discovery, with many FDA-approved drugs derived from these compounds. Despite their significance, the biosynthetic pathways of NPs remain poorly characterized due to their inherent complexity and the limitations of traditional retrosynthesis methods in predicting such intricate reactions. While template-free machine learning models have demonstrated promise in organic synthesis, their application to biosynthetic pathways is still in its infancy. Addressing this gap, we propose the graph-sequence enhanced transformer (GSETransformer), which leverages both graph structural information and sequential dependencies to achieve superior performance in addressing the complexity of biosynthetic data. When evaluated on benchmark datasets, GSETransformer achieves state-of-the-art performance in single- and multi-step retrosynthesis tasks. These results highlight its effectiveness in computational biosynthesis and its potential to facilitate the design of NP-based therapeutics.
天然产物(NPs)在药物发现中起着至关重要的作用,许多获得美国食品药品监督管理局(FDA)批准的药物都源自这些化合物。尽管它们很重要,但由于其固有的复杂性以及传统逆合成方法在预测此类复杂反应方面的局限性,NPs的生物合成途径仍然特征不明。虽然无模板机器学习模型在有机合成中已显示出前景,但其在生物合成途径中的应用仍处于起步阶段。为了填补这一空白,我们提出了图序列增强变换器(GSETransformer),它利用图结构信息和序列依赖性,在处理生物合成数据的复杂性方面实现卓越性能。在基准数据集上进行评估时,GSETransformer在单步和多步逆合成任务中均实现了领先的性能。这些结果突出了其在计算生物合成中的有效性及其促进基于NP的治疗药物设计的潜力。