探索 2 型糖尿病代谢特征的初步研究：基于树的机器学习和生物信息学技术的生物标志物发现管道。

Pilot-Study to Explore Metabolic Signature of Type 2 Diabetes: A Pipeline of Tree-Based Machine Learning and Bioinformatics Techniques for Biomarkers Discovery.

机构信息

Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey.

Department of Physiology, College of Medicine, King Khalid University, Abha 61421, Saudi Arabia.

出版信息

Nutrients. 2024 May 20;16(10):1537. doi: 10.3390/nu16101537.

DOI:10.3390/nu16101537

PMID:38794775

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11124278/

Abstract

BACKGROUND

This study aims to identify unique metabolomics biomarkers associated with Type 2 Diabetes (T2D) and develop an accurate diagnostics model using tree-based machine learning (ML) algorithms integrated with bioinformatics techniques.

METHODS

Univariate and multivariate analyses such as fold change, a receiver operating characteristic curve (ROC), and Partial Least-Squares Discriminant Analysis (PLS-DA) were used to identify biomarker metabolites that showed significant concentration in T2D patients. Three tree-based algorithms [eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Adaptive Boosting (AdaBoost)] that demonstrated robustness in high-dimensional data analysis were used to create a diagnostic model for T2D.

RESULTS

As a result of the biomarker discovery process validated with three different approaches, Pyruvate, D-Rhamnose, AMP, pipecolate, Tetradecenoic acid, Tetradecanoic acid, Dodecanediothioic acid, Prostaglandin E3/D3 (isobars), ADP and Hexadecenoic acid were determined as potential biomarkers for T2D. Our results showed that the XGBoost model [accuracy = 0.831, F1-score = 0.845, sensitivity = 0.882, specificity = 0.774, positive predictive value (PPV) = 0.811, negative-PV (NPV) = 0.857 and Area under the ROC curve (AUC) = 0.887] had the slight highest performance measures.

CONCLUSIONS

ML integrated with bioinformatics techniques offers accurate and positive T2D candidate biomarker discovery. The XGBoost model can successfully distinguish T2D based on metabolites.

摘要

背景

本研究旨在确定与 2 型糖尿病（T2D）相关的独特代谢组学生物标志物，并使用基于树的机器学习（ML）算法与生物信息学技术相结合开发准确的诊断模型。

方法

使用单变量和多变量分析，如折叠变化、接收者操作特征曲线（ROC）和偏最小二乘判别分析（PLS-DA），以确定在 T2D 患者中显示出显著浓度的生物标志物代谢物。使用三种基于树的算法[极端梯度提升（XGBoost）、轻梯度提升机（LightGBM）和自适应提升（AdaBoost）]来创建 T2D 的诊断模型，这些算法在高维数据分析中表现出稳健性。

结果

通过三种不同方法验证的生物标志物发现过程，丙酮酸、D-鼠李糖、AMP、哌啶酸、十四烯酸、十四烷酸、十二烷二硫酸、前列腺素 E3/D3（等重）、ADP 和十六烯酸被确定为 T2D 的潜在生物标志物。我们的结果表明，XGBoost 模型[准确性=0.831、F1 得分=0.845、灵敏度=0.882、特异性=0.774、阳性预测值（PPV）=0.811、阴性预测值（NPV）=0.857 和 ROC 曲线下面积（AUC）=0.887]具有稍高的性能指标。

结论

与生物信息学技术相结合的 ML 提供了准确且积极的 T2D 候选生物标志物发现。XGBoost 模型可以成功基于代谢物区分 T2D。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f89/11124278/e2a69e91fad1/nutrients-16-01537-g001.jpg

相似文献

Pilot-Study to Explore Metabolic Signature of Type 2 Diabetes: A Pipeline of Tree-Based Machine Learning and Bioinformatics Techniques for Biomarkers Discovery.探索 2 型糖尿病代谢特征的初步研究：基于树的机器学习和生物信息学技术的生物标志物发现管道。

Nutrients. 2024 May 20;16(10):1537. doi: 10.3390/nu16101537.

Interpretable machine learning identifies metabolites associated with glomerular filtration rate in type 2 diabetes patients.可解释机器学习确定 2 型糖尿病患者肾小球滤过率相关的代谢物。

Front Endocrinol (Lausanne). 2024 Jun 10;15:1279034. doi: 10.3389/fendo.2024.1279034. eCollection 2024.

Machine learning based on metabolomics reveals potential targets and biomarkers for primary Sjogren's syndrome.基于代谢组学的机器学习揭示了原发性干燥综合征的潜在靶点和生物标志物。

Front Mol Biosci. 2022 Sep 5;9:913325. doi: 10.3389/fmolb.2022.913325. eCollection 2022.

Machine learning-based models for the prediction of breast cancer recurrence risk.基于机器学习的乳腺癌复发风险预测模型。

BMC Med Inform Decis Mak. 2023 Nov 29;23(1):276. doi: 10.1186/s12911-023-02377-z.

Early metabolic markers identify potential targets for the prevention of type 2 diabetes.早期代谢标志物可识别 2 型糖尿病预防的潜在靶点。

Diabetologia. 2017 Sep;60(9):1740-1750. doi: 10.1007/s00125-017-4325-0. Epub 2017 Jun 8.

Establishment and validation of a heart failure risk prediction model for elderly patients after coronary rotational atherectomy based on machine learning.基于机器学习的老年患者冠状动脉旋磨术后心力衰竭风险预测模型的建立与验证

PeerJ. 2024 Jan 31;12:e16867. doi: 10.7717/peerj.16867. eCollection 2024.

Predicting post-stroke pneumonia using deep neural network approaches.使用深度神经网络方法预测卒中后肺炎。

Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.

A machine learning prediction model for cancer risk in patients with type 2 diabetes based on clinical tests.基于临床检测的 2 型糖尿病患者癌症风险机器学习预测模型。

Technol Health Care. 2024;32(3):1431-1443. doi: 10.3233/THC-230385.

Preventive machine learning models incorporating health checkup data and hair mineral analysis for low bone mass identification.纳入健康检查数据和头发矿物质分析的预防机器学习模型，用于识别低骨量。

Sci Rep. 2024 Aug 13;14(1):18792. doi: 10.1038/s41598-024-69090-3.

Predicting adverse drug events in older inpatients: a machine learning study.预测老年住院患者的药物不良事件：一项机器学习研究。

Int J Clin Pharm. 2022 Dec;44(6):1304-1311. doi: 10.1007/s11096-022-01468-7. Epub 2022 Sep 17.

引用本文的文献

Colorectal Cancer Detection Through Sweat Volatilome Using an Electronic Nose System and GC-MS Analysis.利用电子鼻系统和气相色谱-质谱联用分析通过汗液挥发物组检测结直肠癌

Cancers (Basel). 2025 Aug 23;17(17):2742. doi: 10.3390/cancers17172742.

Metabolomic Alterations in Patients with Obesity and the Impact of Metabolic Bariatric Surgery: Insights for Future Research.肥胖患者的代谢组学改变及代谢性减肥手术的影响：对未来研究的启示

Metabolites. 2025 Jun 26;15(7):434. doi: 10.3390/metabo15070434.

Explainable Boosting Machines Identify Key Metabolomic Biomarkers in Rheumatoid Arthritis.可解释的增强机器识别类风湿性关节炎中的关键代谢组学生物标志物。

Medicina (Kaunas). 2025 Apr 30;61(5):833. doi: 10.3390/medicina61050833.

Identification of metabolomics-based biomarker discovery in individuals with down syndrome utilizing kernel-tree model-enhanced explainable artificial intelligence methodology.利用核树模型增强的可解释人工智能方法识别唐氏综合征个体中基于代谢组学的生物标志物。

Front Mol Biosci. 2025 Apr 9;12:1567199. doi: 10.3389/fmolb.2025.1567199. eCollection 2025.

Untargeted Lipidomic Biomarkers for Liver Cancer Diagnosis: A Tree-Based Machine Learning Model Enhanced by Explainable Artificial Intelligence.用于肝癌诊断的非靶向脂质组学生物标志物：一种由可解释人工智能增强的基于树的机器学习模型。

Medicina (Kaunas). 2025 Feb 26;61(3):405. doi: 10.3390/medicina61030405.

Metabolic profiling identifies potential biomarkers associated with progression from gestational diabetes mellitus to prediabetes postpartum.代谢谱分析确定了与妊娠期糖尿病进展为产后糖尿病前期相关的潜在生物标志物。

J Biomed Res. 2024 Nov 25:1-13. doi: 10.7555/JBR.38.20240267.

Platelet Metabolites as Candidate Biomarkers in Sepsis Diagnosis and Management Using the Proposed Explainable Artificial Intelligence Approach.使用所提出的可解释人工智能方法，血小板代谢物作为脓毒症诊断和管理中的候选生物标志物。

J Clin Med. 2024 Aug 23;13(17):5002. doi: 10.3390/jcm13175002.

本文引用的文献

Explainable Artificial Intelligence Paves the Way in Precision Diagnostics and Biomarker Discovery for the Subclass of Diabetic Retinopathy in Type 2 Diabetics.可解释人工智能为2型糖尿病患者糖尿病视网膜病变亚类的精准诊断和生物标志物发现铺平了道路。

Metabolites. 2023 Dec 18;13(12):1204. doi: 10.3390/metabo13121204.

Estimation of Obesity Levels through the Proposed Predictive Approach Based on Physical Activity and Nutritional Habits.通过基于身体活动和营养习惯的拟议预测方法估算肥胖水平。

Diagnostics (Basel). 2023 Sep 14;13(18):2949. doi: 10.3390/diagnostics13182949.

Metabolic Profile of Individuals with and without Type 2 Diabetes from Sub-Saharan Africa.撒哈拉以南非洲地区2型糖尿病患者与非2型糖尿病患者的代谢特征

J Proteome Res. 2023 Jul 7;22(7):2319-2326. doi: 10.1021/acs.jproteome.3c00070. Epub 2023 Jun 2.

A Fecal-Microbial-Extracellular-Vesicles-Based Metabolomics Machine Learning Framework and Biomarker Discovery for Predicting Colorectal Cancer Patients.一种基于粪便微生物细胞外囊泡的代谢组学机器学习框架及用于预测结直肠癌患者的生物标志物发现

Metabolites. 2023 Apr 25;13(5):589. doi: 10.3390/metabo13050589.

Plasma Prostaglandin E Metabolite Levels Predict Type 2 Diabetes Status and One-Year Therapeutic Response Independent of Clinical Markers of Inflammation.血浆前列腺素E代谢物水平可独立于炎症临床标志物预测2型糖尿病状态及一年治疗反应。

Metabolites. 2022 Dec 8;12(12):1234. doi: 10.3390/metabo12121234.

Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study.基于全基因组多基因风险评分和代谢谱预测 2 型糖尿病：基于人群的 10 年前瞻性队列研究的机器学习分析。

EBioMedicine. 2022 Dec;86:104383. doi: 10.1016/j.ebiom.2022.104383. Epub 2022 Nov 30.

Diabetes Management in Chronic Kidney Disease: A Consensus Report by the American Diabetes Association (ADA) and Kidney Disease: Improving Global Outcomes (KDIGO).慢性肾脏病中的糖尿病管理：美国糖尿病协会（ADA）和改善全球肾脏病预后组织（KDIGO）的共识报告。

Diabetes Care. 2022 Dec 1;45(12):3075-3090. doi: 10.2337/dci22-0027.

An Update on the Epidemiology of Type 2 Diabetes: A Global Perspective.2 型糖尿病流行病学的最新进展：全球视角。

Endocrinol Metab Clin North Am. 2021 Sep;50(3):337-355. doi: 10.1016/j.ecl.2021.05.013.

FoxO1 inhibition alleviates type 2 diabetes-related diastolic dysfunction by increasing myocardial pyruvate dehydrogenase activity.FoxO1 抑制通过增加心肌丙酮酸脱氢酶活性缓解 2 型糖尿病相关的舒张功能障碍。

Cell Rep. 2021 Apr 6;35(1):108935. doi: 10.1016/j.celrep.2021.108935.

Circulating Metabolites Associated with Postprandial Satiety in Overweight/Obese Participants: The SATIN Study.超重/肥胖参与者餐后饱腹感相关的循环代谢物：SATIN 研究。

Nutrients. 2021 Feb 8;13(2):549. doi: 10.3390/nu13020549.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

探索 2 型糖尿病代谢特征的初步研究：基于树的机器学习和生物信息学技术的生物标志物发现管道。

Pilot-Study to Explore Metabolic Signature of Type 2 Diabetes: A Pipeline of Tree-Based Machine Learning and Bioinformatics Techniques for Biomarkers Discovery.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献