• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于深度迁移学习的植物次生代谢途径预测。

Prediction of plant secondary metabolic pathways using deep transfer learning.

机构信息

CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.

University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.

出版信息

BMC Bioinformatics. 2023 Sep 19;24(1):348. doi: 10.1186/s12859-023-05485-9.

DOI:10.1186/s12859-023-05485-9
PMID:37726702
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10507959/
Abstract

BACKGROUND

Plant secondary metabolites are highly valued for their applications in pharmaceuticals, nutrition, flavors, and aesthetics. It is of great importance to elucidate plant secondary metabolic pathways due to their crucial roles in biological processes during plant growth and development. However, understanding plant biosynthesis and degradation pathways remains a challenge due to the lack of sufficient information in current databases. To address this issue, we proposed a transfer learning approach using a pre-trained hybrid deep learning architecture that combines Graph Transformer and convolutional neural network (GTC) to predict plant metabolic pathways.

RESULTS

GTC provides comprehensive molecular representation by extracting both structural features from the molecular graph and textual information from the SMILES string. GTC is pre-trained on the KEGG datasets to acquire general features, followed by fine-tuning on plant-derived datasets. Four metrics were chosen for model performance evaluation. The results show that GTC outperforms six other models, including three previously reported machine learning models, on the KEGG dataset. GTC yields an accuracy of 96.75%, precision of 85.14%, recall of 83.03%, and F1_score of 84.06%. Furthermore, an ablation study confirms the indispensability of all the components of the hybrid GTC model. Transfer learning is then employed to leverage the shared knowledge acquired from the KEGG metabolic pathways. As a result, the transferred GTC exhibits outstanding accuracy in predicting plant secondary metabolic pathways with an average accuracy of 98.30% in fivefold cross-validation and 97.82% on the final test. In addition, GTC is employed to classify natural products. It achieves a perfect accuracy score of 100.00% for alkaloids, while the lowest accuracy score of 98.42% for shikimates and phenylpropanoids.

CONCLUSIONS

The proposed GTC effectively captures molecular features, and achieves high performance in classifying KEGG metabolic pathways and predicting plant secondary metabolic pathways via transfer learning. Furthermore, GTC demonstrates its generalization ability by accurately classifying natural products. A user-friendly executable program has been developed, which only requires the input of the SMILES string of the query compound in a graphical interface.

摘要

背景

植物次生代谢产物因其在医药、营养、风味和美学方面的应用而备受重视。阐明植物次生代谢途径非常重要,因为它们在植物生长和发育过程中的生物过程中起着至关重要的作用。然而,由于当前数据库中信息不足,理解植物生物合成和降解途径仍然是一个挑战。为了解决这个问题,我们提出了一种使用预先训练的混合深度学习架构的迁移学习方法,该架构结合了图转换器和卷积神经网络(GTC)来预测植物代谢途径。

结果

GTC 通过从分子图中提取结构特征和从 SMILES 字符串中提取文本信息,提供全面的分子表示。GTC 在 KEGG 数据集上进行预训练以获取一般特征,然后在植物衍生数据集上进行微调。选择了四个指标来评估模型性能。结果表明,GTC 在 KEGG 数据集上的表现优于其他六个模型,包括三个之前报告的机器学习模型。GTC 的准确率为 96.75%,精度为 85.14%,召回率为 83.03%,F1 得分为 84.06%。此外,消融研究证实了混合 GTC 模型所有组成部分的不可或缺性。然后,采用迁移学习来利用从 KEGG 代谢途径中获得的共享知识。结果,转移后的 GTC 在预测植物次生代谢途径方面表现出色,在五重交叉验证中的平均准确率为 98.30%,最终测试中的准确率为 97.82%。此外,GTC 用于分类天然产物。它对生物碱的准确率达到了 100.00%,而对 shikimates 和苯基丙氨酸的准确率最低,为 98.42%。

结论

所提出的 GTC 有效地捕获分子特征,并通过迁移学习在分类 KEGG 代谢途径和预测植物次生代谢途径方面取得了很高的性能。此外,GTC 通过准确地对天然产物进行分类,展示了其泛化能力。已经开发了一个用户友好的可执行程序,该程序只需在图形界面中输入查询化合物的 SMILES 字符串即可。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/121343b05efb/12859_2023_5485_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/08ceeadccece/12859_2023_5485_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/74c0ee384928/12859_2023_5485_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/121343b05efb/12859_2023_5485_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/08ceeadccece/12859_2023_5485_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/74c0ee384928/12859_2023_5485_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51be/10507959/121343b05efb/12859_2023_5485_Fig3_HTML.jpg

相似文献

1
Prediction of plant secondary metabolic pathways using deep transfer learning.基于深度迁移学习的植物次生代谢途径预测。
BMC Bioinformatics. 2023 Sep 19;24(1):348. doi: 10.1186/s12859-023-05485-9.
2
Gtie-Rt: A comprehensive graph learning model for predicting drugs targeting metabolic pathways in human.GTIE-Rt:一种全面的图学习模型,用于预测靶向人类代谢途径的药物。
J Bioinform Comput Biol. 2024 Jun;22(3):2450010. doi: 10.1142/S0219720024500100. Epub 2024 Jul 20.
3
GTC: GNN-Transformer co-contrastive learning for self-supervised heterogeneous graph representation.GTC:用于自监督异构图表示的GNN-Transformer协同对比学习
Neural Netw. 2025 Jan;181:106645. doi: 10.1016/j.neunet.2024.106645. Epub 2024 Aug 16.
4
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.MABAL:一种用于机器辅助骨龄标注的新型深度学习架构。
J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.
5
DeepRT: Predicting compounds presence in pathway modules and classifying into module classes using deep neural networks based on molecular properties.DeepRT:基于分子性质使用深度神经网络预测通路模块中的化合物存在情况并对其进行模块分类。
J Bioinform Comput Biol. 2023 Aug;21(4):2350017. doi: 10.1142/S0219720023500178. Epub 2023 Aug 24.
6
A novel hybrid framework for metabolic pathways prediction based on the graph attention network.基于图注意力网络的代谢途径预测新混合框架。
BMC Bioinformatics. 2022 Sep 28;23(Suppl 5):329. doi: 10.1186/s12859-022-04856-y.
7
Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks.基于生物合成途径起始物质的生物碱分类:使用图卷积神经网络。
BMC Bioinformatics. 2019 Jul 9;20(1):380. doi: 10.1186/s12859-019-2963-6.
8
Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像(MRI)中进行脑肿瘤分割与检测
Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.
9
MMAgentRec, a personalized multi-modal recommendation agent with large language model.MMAgentRec,一个带有大语言模型的个性化多模态推荐代理。
Sci Rep. 2025 Apr 8;15(1):12062. doi: 10.1038/s41598-025-96458-w.
10
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.

引用本文的文献

1
From data to discovery: leveraging big data in plant natural products biosynthesis research.从数据到发现:植物天然产物生物合成研究中大数据的利用
Plant J. 2025 Jun;122(6):e70288. doi: 10.1111/tpj.70288.
2
Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops.利用监督式机器学习方法来理解非生物胁迫耐受性并设计抗逆作物。
Philos Trans R Soc Lond B Biol Sci. 2025 May 29;380(1927):20240252. doi: 10.1098/rstb.2024.0252.
3
Learning motif features and topological structure of molecules for metabolic pathway prediction.

本文引用的文献

1
Environmental Factors Regulate Plant Secondary Metabolites.环境因素调控植物次生代谢产物。
Plants (Basel). 2023 Jan 18;12(3):447. doi: 10.3390/plants12030447.
2
MLGL-MP: a Multi-Label Graph Learning framework enhanced by pathway interdependence for Metabolic Pathway prediction.MLGL-MP:一种通过途径相互依赖性增强的多标签图学习框架,用于代谢途径预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i325-i332. doi: 10.1093/bioinformatics/btac222.
3
Computational prediction of plant metabolic pathways.植物代谢途径的计算预测
学习用于代谢途径预测的分子基序特征和拓扑结构。
J Cheminform. 2025 Apr 21;17(1):56. doi: 10.1186/s13321-025-00994-6.
4
Navigating the challenges of engineering composite specialized metabolite pathways in plants.应对植物中工程化复合特殊代谢途径的挑战。
Plant J. 2025 Mar;121(6):e70100. doi: 10.1111/tpj.70100.
Curr Opin Plant Biol. 2022 Apr;66:102171. doi: 10.1016/j.pbi.2021.102171. Epub 2022 Jan 22.
4
Targeting Mitochondria by Plant Secondary Metabolites: A Promising Strategy in Combating Parkinson's Disease.靶向植物次生代谢物的线粒体:治疗帕金森病的有前途策略。
Int J Mol Sci. 2021 Nov 22;22(22):12570. doi: 10.3390/ijms222212570.
5
NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products.NPClassifier:一种基于深度神经网络的天然产物结构分类工具。
J Nat Prod. 2021 Nov 26;84(11):2795-2807. doi: 10.1021/acs.jnatprod.1c00399. Epub 2021 Oct 18.
6
Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.使用递归特征选择和随机森林提高低相似度序列的蛋白质结构分类预测。
Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.
7
iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network.iMPTCE-Hnetwork:一种基于异构网络的用于识别化学物质和酶代谢途径类型的多标签分类器。
Comput Math Methods Med. 2021 Jan 4;2021:6683051. doi: 10.1155/2021/6683051. eCollection 2021.
8
COCONUT online: Collection of Open Natural Products database.COCONUT在线:开放天然产物数据库集合。
J Cheminform. 2021 Jan 10;13(1):2. doi: 10.1186/s13321-020-00478-9.
9
Effect of virus infection on the secondary metabolite production and phytohormone biosynthesis in plants.病毒感染对植物次生代谢产物合成及植物激素生物合成的影响。
3 Biotech. 2020 Dec;10(12):547. doi: 10.1007/s13205-020-02541-6. Epub 2020 Nov 24.
10
GraphDTA: predicting drug-target binding affinity with graph neural networks.GraphDTA:基于图神经网络的药物-靶标结合亲和力预测。
Bioinformatics. 2021 May 23;37(8):1140-1147. doi: 10.1093/bioinformatics/btaa921.