• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于训练机器学习模型以预测代谢物途径参与情况的基准数据集。

Benchmark Dataset for Training Machine Learning Models to Predict the Pathway Involvement of Metabolites.

作者信息

Huckvale Erik D, Powell Christian D, Jin Huan, Moseley Hunter N B

机构信息

Markey Cancer Center, University of Kentucky, Lexington, KY 40506, USA.

Superfund Research Center, University of Kentucky, Lexington, KY 40506, USA.

出版信息

Metabolites. 2023 Nov 1;13(11):1120. doi: 10.3390/metabo13111120.

DOI:10.3390/metabo13111120
PMID:37999216
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10673125/
Abstract

Metabolic pathways are a human-defined grouping of life sustaining biochemical reactions, metabolites being both the reactants and products of these reactions. But many public datasets include identified metabolites whose pathway involvement is unknown, hindering metabolic interpretation. To address these shortcomings, various machine learning models, including those trained on data from the Kyoto Encyclopedia of Genes and Genomes (KEGG), have been developed to predict the pathway involvement of metabolites based on their chemical descriptions; however, these prior models are based on old metabolite KEGG-based datasets, including one benchmark dataset that is invalid due to the presence of over 1500 duplicate entries. Therefore, we have developed a new benchmark dataset derived from the KEGG following optimal standards of scientific computational reproducibility and including all source code needed to update the benchmark dataset as KEGG changes. We have used this new benchmark dataset with our atom coloring methodology to develop and compare the performance of Random Forest, XGBoost, and multilayer perceptron with autoencoder models generated from our new benchmark dataset. Best overall weighted average performance across 1000 unique folds was an F1 score of 0.8180 and a Matthews correlation coefficient of 0.7933, which was provided by XGBoost binary classification models for 11 KEGG-defined pathway categories.

摘要

代谢途径是人为定义的维持生命的生化反应分组,代谢物既是这些反应的反应物也是产物。但是许多公共数据集包含已鉴定的代谢物,其途径参与情况未知,这阻碍了代谢解释。为了解决这些缺点,已经开发了各种机器学习模型,包括那些基于京都基因与基因组百科全书(KEGG)数据训练的模型,以根据代谢物的化学描述预测其途径参与情况;然而,这些先前的模型基于旧的基于KEGG的代谢物数据集,包括一个由于存在超过1500个重复条目而无效的基准数据集。因此,我们按照科学计算可重复性的最佳标准,从KEGG开发了一个新的基准数据集,并包括随着KEGG变化更新基准数据集所需的所有源代码。我们使用这个新的基准数据集和我们的原子着色方法来开发和比较随机森林、XGBoost以及从我们的新基准数据集生成的带有自动编码器模型的多层感知器的性能。在1000个独特折叠中,最佳的总体加权平均性能是F1分数为0.8180,马修斯相关系数为0.7933,这是由XGBoost二元分类模型针对11个KEGG定义的途径类别提供的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/673bebe8ffa4/metabolites-13-01120-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/a0a83c80399f/metabolites-13-01120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/66845e97b96b/metabolites-13-01120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/63b677410727/metabolites-13-01120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/2dd310b1a2f2/metabolites-13-01120-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/b9b006eef151/metabolites-13-01120-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/173bcbe94a06/metabolites-13-01120-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/d13815cbf656/metabolites-13-01120-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/09304697eb22/metabolites-13-01120-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/df0d8c5b9420/metabolites-13-01120-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/673bebe8ffa4/metabolites-13-01120-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/a0a83c80399f/metabolites-13-01120-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/66845e97b96b/metabolites-13-01120-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/63b677410727/metabolites-13-01120-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/2dd310b1a2f2/metabolites-13-01120-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/b9b006eef151/metabolites-13-01120-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/173bcbe94a06/metabolites-13-01120-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/d13815cbf656/metabolites-13-01120-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/09304697eb22/metabolites-13-01120-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/df0d8c5b9420/metabolites-13-01120-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4143/10673125/673bebe8ffa4/metabolites-13-01120-g010.jpg

相似文献

1
Benchmark Dataset for Training Machine Learning Models to Predict the Pathway Involvement of Metabolites.用于训练机器学习模型以预测代谢物途径参与情况的基准数据集。
Metabolites. 2023 Nov 1;13(11):1120. doi: 10.3390/metabo13111120.
2
Benchmark dataset for training machine learning models to predict the pathway involvement of metabolites.用于训练机器学习模型以预测代谢物途径参与情况的基准数据集。
bioRxiv. 2023 Oct 9:2023.10.03.560715. doi: 10.1101/2023.10.03.560715.
3
Predicting the Pathway Involvement of Metabolites in Both Pathway Categories and Individual Pathways.预测代谢物在通路类别和单个通路中的通路参与情况。
bioRxiv. 2024 Aug 9:2024.08.07.607025. doi: 10.1101/2024.08.07.607025.
4
Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways.预测代谢物与通路类别及单个通路之间的关联。
Metabolites. 2024 Sep 21;14(9):510. doi: 10.3390/metabo14090510.
5
Predicting the Pathway Involvement of Metabolites Based on Combined Metabolite and Pathway Features.基于代谢物和通路特征组合预测代谢物的通路参与情况
Metabolites. 2024 May 7;14(5):266. doi: 10.3390/metabo14050266.
6
A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement.一个关于在监督学习中使用预测代谢途径参与的数据集进行适当验证的警示故事。
PLoS One. 2024 May 2;19(5):e0299583. doi: 10.1371/journal.pone.0299583. eCollection 2024.
7
Predicting The Pathway Involvement Of Metabolites Based on Combined Metabolite and Pathway Features.基于代谢物和通路特征组合预测代谢物的通路参与情况
bioRxiv. 2024 Apr 2:2024.04.01.587582. doi: 10.1101/2024.04.01.587582.
8
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
9
PriPath: identifying dysregulated pathways from differential gene expression via grouping, scoring, and modeling with an embedded feature selection approach.PriPath:通过分组、评分和建模,并结合嵌入式特征选择方法,从差异基因表达中识别失调途径。
BMC Bioinformatics. 2023 Feb 23;24(1):60. doi: 10.1186/s12859-023-05187-2.
10
Development of interpretable machine learning models to predict in-hospital prognosis of acute heart failure patients.开发可解释的机器学习模型以预测急性心力衰竭患者的院内预后。
ESC Heart Fail. 2024 Oct;11(5):2798-2812. doi: 10.1002/ehf2.14834. Epub 2024 May 15.

引用本文的文献

1
QSPR graph model to explore physicochemical properties of potential antiviral drugs of dengue disease through novel coloring-based topological indices.通过基于新型染色的拓扑指数探索登革热疾病潜在抗病毒药物理化性质的定量构效关系(QSPR)图模型。
Front Chem. 2025 Aug 18;13:1599715. doi: 10.3389/fchem.2025.1599715. eCollection 2025.
2
Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase.预测Reactome知识库中注释化合物的通路参与情况。
Metabolites. 2025 Mar 1;15(3):161. doi: 10.3390/metabo15030161.
3
Predicting the Pathway Involvement of All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Genes and Genomes.

本文引用的文献

1
A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement.一个关于在监督学习中使用预测代谢途径参与的数据集进行适当验证的警示故事。
PLoS One. 2024 May 2;19(5):e0299583. doi: 10.1371/journal.pone.0299583. eCollection 2024.
2
md_harmonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases.md_harmonize:一个用于公共代谢数据库原子级协调的Python包。
Metabolites. 2023 Dec 17;13(12):1199. doi: 10.3390/metabo13121199.
3
kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes.
预测《京都基因与基因组百科全书》中定义的所有通路及相关化合物条目的通路参与情况。
Metabolites. 2024 Oct 27;14(11):582. doi: 10.3390/metabo14110582.
4
Current approaches and outstanding challenges of functional annotation of metabolites: a comprehensive review.当前代谢物功能注释的方法和突出挑战:全面综述。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae498.
5
Predicting the Association of Metabolites with Both Pathway Categories and Individual Pathways.预测代谢物与通路类别及单个通路之间的关联。
Metabolites. 2024 Sep 21;14(9):510. doi: 10.3390/metabo14090510.
6
Predicting the Pathway Involvement of Metabolites in Both Pathway Categories and Individual Pathways.预测代谢物在通路类别和单个通路中的通路参与情况。
bioRxiv. 2024 Aug 9:2024.08.07.607025. doi: 10.1101/2024.08.07.607025.
7
Predicting the Pathway Involvement of Metabolites Based on Combined Metabolite and Pathway Features.基于代谢物和通路特征组合预测代谢物的通路参与情况
Metabolites. 2024 May 7;14(5):266. doi: 10.3390/metabo14050266.
8
A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement.一个关于在监督学习中使用预测代谢途径参与的数据集进行适当验证的警示故事。
PLoS One. 2024 May 2;19(5):e0299583. doi: 10.1371/journal.pone.0299583. eCollection 2024.
9
Predicting The Pathway Involvement Of Metabolites Based on Combined Metabolite and Pathway Features.基于代谢物和通路特征组合预测代谢物的通路参与情况
bioRxiv. 2024 Apr 2:2024.04.01.587582. doi: 10.1101/2024.04.01.587582.
KEGG_PULL:一个用于通过 RESTful 访问和从京都基因与基因组百科全书(KEGG)中提取数据的软件包。
BMC Bioinformatics. 2023 Mar 4;24(1):78. doi: 10.1186/s12859-023-05208-0.
4
KEGG for taxonomy-based analysis of pathways and genomes.KEGG 用于基于分类的途径和基因组分析。
Nucleic Acids Res. 2023 Jan 6;51(D1):D587-D592. doi: 10.1093/nar/gkac963.
5
MLGL-MP: a Multi-Label Graph Learning framework enhanced by pathway interdependence for Metabolic Pathway prediction.MLGL-MP:一种通过途径相互依赖性增强的多标签图学习框架,用于代谢途径预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i325-i332. doi: 10.1093/bioinformatics/btac222.
6
The reactome pathway knowledgebase 2022.反应体通路知识库2022版。
Nucleic Acids Res. 2022 Jan 7;50(D1):D687-D692. doi: 10.1093/nar/gkab1028.
7
Hierarchical Harmonization of Atom-Resolved Metabolic Reactions across Metabolic Databases.跨代谢数据库的原子解析代谢反应的分层协调
Metabolites. 2021 Jun 30;11(7):431. doi: 10.3390/metabo11070431.
8
Predicting biological pathways of chemical compounds with a profile-inspired approach.基于特征分析的方法预测化合物的生物途径。
BMC Bioinformatics. 2021 Jun 12;22(1):320. doi: 10.1186/s12859-021-04252-y.
9
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
10
Atom Identifiers Generated by a Neighborhood-Specific Graph Coloring Method Enable Compound Harmonization across Metabolic Databases.通过邻域特定图着色方法生成的原子标识符可实现跨代谢数据库的化合物协调。
Metabolites. 2020 Sep 11;10(9):368. doi: 10.3390/metabo10090368.