• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

需要进行化学表示标准化,以便在《京都基因与基因组百科全书》、Reactome和MetaCyc知识库中推广代谢途径参与预测。

Chemical representation standardization needed to generalize metabolic pathway involvement prediction across the Kyoto Encyclopedia of Genes and Genomes, Reactome, and MetaCyc knowledgebases.

作者信息

Huckvale Erik D, Moseley Hunter N B

机构信息

Markey Cancer Center, University of Kentucky, Lexington, KY, USA.

Superfund Research Center, University of Kentucky, Lexington, KY, USA.

出版信息

bioRxiv. 2025 Apr 8:2025.04.02.646918. doi: 10.1101/2025.04.02.646918.

DOI:10.1101/2025.04.02.646918
PMID:40291671
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12026579/
Abstract

MOTIVATION

Due to the utility of knowing the pathway involvement of compounds detected in biological experiments, knowledgebases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and MetaCyc have aggregated pathway annotations of compounds. However, these annotations are largely incomplete and are costly to obtain experimentally and curate from published scientific literature.

RESULTS

We constructed a new dataset using compounds and their pathway annotations from KEGG, Reactome, and MetaCyc. Using this dataset, we trained and tested an extreme classification model that classifies 8,195 unique pathways based on compound chemical representations with a mean Matthews correlation coefficient (MCC) of 0.9036 ± 0.0033. During model evaluation, we discovered an inconsistency in chemical representations across knowledgebases, which was alleviated by standardizing the chemical representations using InChI (IUPAC International Chemical Identifier) canonicalization. Next, we compared the MCC between compounds and their cross-knowledgebase references. The non-standardized chemical representations had a huge 0.2687 drop in MCC while the standardized chemical representations only had a 0.0384 drop in MCC. Thus, standardizing chemical representation is an essential step when predicting on novel chemical representations.

AVAILABILITY AND IMPLEMENTATION

All code and data for reproducing the results of this manuscript are available in the following figshare items:Manuscript main results: https://doi.org/10.6084/m9.figshare.28701845CV analysis of model and dataset of prior studies: https://doi.org/10.6084/m9.figshare.28701590.

摘要

动机

由于了解生物实验中检测到的化合物的途径参与情况具有实用性,京都基因与基因组百科全书(KEGG)、Reactome和MetaCyc等知识库汇总了化合物的途径注释。然而,这些注释在很大程度上是不完整的,通过实验获取并从已发表的科学文献中整理成本很高。

结果

我们使用KEGG、Reactome和MetaCyc中的化合物及其途径注释构建了一个新数据集。利用这个数据集,我们训练并测试了一个极端分类模型,该模型基于化合物的化学表示对8195条独特途径进行分类,平均马修斯相关系数(MCC)为0.9036±0.0033。在模型评估过程中,我们发现不同知识库之间的化学表示存在不一致性,通过使用国际纯粹与应用化学联合会(IUPAC)国际化学标识符(InChI)规范化来标准化化学表示,这种不一致性得到了缓解。接下来,我们比较了化合物与其跨知识库参考之间的MCC。未标准化的化学表示的MCC下降了0.2687,而标准化的化学表示的MCC仅下降了0.0384。因此,在对新的化学表示进行预测时,标准化化学表示是必不可少的一步。

可用性和实现方式

用于重现本手稿结果的所有代码和数据可在以下figshare项目中获取:

手稿主要结果

https://doi.org/10.6084/m9.figshare.28701845

先前研究的模型和数据集的CV分析:https://doi.org/10.6084/m9.figshare.28701590。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c4/12026579/b739307d3a10/nihpp-2025.04.02.646918v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c4/12026579/b739307d3a10/nihpp-2025.04.02.646918v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c4/12026579/b739307d3a10/nihpp-2025.04.02.646918v1-f0001.jpg

相似文献

1
Chemical representation standardization needed to generalize metabolic pathway involvement prediction across the Kyoto Encyclopedia of Genes and Genomes, Reactome, and MetaCyc knowledgebases.需要进行化学表示标准化,以便在《京都基因与基因组百科全书》、Reactome和MetaCyc知识库中推广代谢途径参与预测。
bioRxiv. 2025 Apr 8:2025.04.02.646918. doi: 10.1101/2025.04.02.646918.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes.预测《京都基因与基因组百科全书》中定义的所有通路及相关化合物条目的通路参与情况。
Metabolites. 2024 Oct 27;14(11):582. doi: 10.3390/metabo14110582.
4
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
5
Genetic determinants of testicular sperm extraction outcomes: insights from a large multicentre study of men with non-obstructive azoospermia.睾丸精子提取结果的遗传决定因素:来自一项针对非梗阻性无精子症男性的大型多中心研究的见解
Hum Reprod Open. 2025 Aug 29;2025(3):hoaf049. doi: 10.1093/hropen/hoaf049. eCollection 2025.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
Drugs for preventing postoperative nausea and vomiting in adults after general anaesthesia: a network meta-analysis.成人全身麻醉后预防术后恶心呕吐的药物:网状Meta分析
Cochrane Database Syst Rev. 2020 Oct 19;10(10):CD012859. doi: 10.1002/14651858.CD012859.pub2.
8
Sexual Harassment and Prevention Training性骚扰与预防培训
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

本文引用的文献

1
Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase.预测Reactome知识库中注释化合物的通路参与情况。
Metabolites. 2025 Mar 1;15(3):161. doi: 10.3390/metabo15030161.
2
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes.预测《京都基因与基因组百科全书》中定义的所有通路及相关化合物条目的通路参与情况。
Metabolites. 2024 Oct 27;14(11):582. doi: 10.3390/metabo14110582.
3
Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways.
预测代谢物与通路类别及单个通路之间的关联。
Metabolites. 2024 Sep 21;14(9):510. doi: 10.3390/metabo14090510.
4
Predicting the Pathway Involvement of Metabolites Based on Combined Metabolite and Pathway Features.基于代谢物和通路特征组合预测代谢物的通路参与情况
Metabolites. 2024 May 7;14(5):266. doi: 10.3390/metabo14050266.
5
A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement.一个关于在监督学习中使用预测代谢途径参与的数据集进行适当验证的警示故事。
PLoS One. 2024 May 2;19(5):e0299583. doi: 10.1371/journal.pone.0299583. eCollection 2024.
6
md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases.md_harmonize:一个用于公共代谢数据库原子级协调的Python包。
Metabolites. 2023 Dec 17;13(12):1199. doi: 10.3390/metabo13121199.
7
The Reactome Pathway Knowledgebase 2024.Reactome 通路知识库 2024.
Nucleic Acids Res. 2024 Jan 5;52(D1):D672-D678. doi: 10.1093/nar/gkad1025.
8
kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes.KEGG_PULL:一个用于通过 RESTful 访问和从京都基因与基因组百科全书(KEGG)中提取数据的软件包。
BMC Bioinformatics. 2023 Mar 4;24(1):78. doi: 10.1186/s12859-023-05208-0.
9
KEGG for taxonomy-based analysis of pathways and genomes.KEGG 用于基于分类的途径和基因组分析。
Nucleic Acids Res. 2023 Jan 6;51(D1):D587-D592. doi: 10.1093/nar/gkac963.
10
MLGL-MP: a Multi-Label Graph Learning framework enhanced by pathway interdependence for Metabolic Pathway prediction.MLGL-MP:一种通过途径相互依赖性增强的多标签图学习框架,用于代谢途径预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i325-i332. doi: 10.1093/bioinformatics/btac222.