Capela João, Cheixo João, de Ridder Dick, Dias Oscar, Rocha Miguel
Centre of Biological Engineering, University of Minho, 4710-057, Braga, Portugal.
Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands.
J Integr Bioinform. 2025 Mar 20. doi: 10.1515/jib-2024-0050.
Plants produce specialized metabolites, which play critical roles in defending against biotic and abiotic stresses. Due to their chemical diversity and bioactivity, these compounds have significant economic implications, particularly in the pharmaceutical and agrotechnology sectors. Despite their importance, the biosynthetic pathways of these metabolites remain largely unresolved. Automating the prediction of their precursors, derived from primary metabolism, is essential for accelerating pathway discovery. Using DeepMol's automated machine learning engine, we found that regularized linear classifiers offer optimal, accurate, and interpretable models for this task, outperforming state-of-the-art models while providing chemical insights into their predictions. The pipeline and models are available at the repository: https://github.com/jcapels/SMPrecursorPredictor.
植物产生特殊代谢产物,这些代谢产物在抵御生物和非生物胁迫中发挥着关键作用。由于其化学多样性和生物活性,这些化合物具有重大的经济意义,尤其是在制药和农业技术领域。尽管它们很重要,但这些代谢产物的生物合成途径在很大程度上仍未得到解决。自动化预测源自初级代谢的其前体,对于加速途径发现至关重要。使用DeepMol的自动化机器学习引擎,我们发现正则化线性分类器为这项任务提供了最优、准确且可解释的模型,优于现有最先进的模型,同时还能对其预测提供化学见解。该流程和模型可在以下存储库获取:https://github.com/jcapels/SMPrecursorPredictor 。