Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, Turkey.
Department of Physiology, College of Medicine, King Khalid University, Abha 61421, Saudi Arabia.
Nutrients. 2024 May 20;16(10):1537. doi: 10.3390/nu16101537.
This study aims to identify unique metabolomics biomarkers associated with Type 2 Diabetes (T2D) and develop an accurate diagnostics model using tree-based machine learning (ML) algorithms integrated with bioinformatics techniques.
Univariate and multivariate analyses such as fold change, a receiver operating characteristic curve (ROC), and Partial Least-Squares Discriminant Analysis (PLS-DA) were used to identify biomarker metabolites that showed significant concentration in T2D patients. Three tree-based algorithms [eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Adaptive Boosting (AdaBoost)] that demonstrated robustness in high-dimensional data analysis were used to create a diagnostic model for T2D.
As a result of the biomarker discovery process validated with three different approaches, Pyruvate, D-Rhamnose, AMP, pipecolate, Tetradecenoic acid, Tetradecanoic acid, Dodecanediothioic acid, Prostaglandin E3/D3 (isobars), ADP and Hexadecenoic acid were determined as potential biomarkers for T2D. Our results showed that the XGBoost model [accuracy = 0.831, F1-score = 0.845, sensitivity = 0.882, specificity = 0.774, positive predictive value (PPV) = 0.811, negative-PV (NPV) = 0.857 and Area under the ROC curve (AUC) = 0.887] had the slight highest performance measures.
ML integrated with bioinformatics techniques offers accurate and positive T2D candidate biomarker discovery. The XGBoost model can successfully distinguish T2D based on metabolites.
本研究旨在确定与 2 型糖尿病(T2D)相关的独特代谢组学生物标志物,并使用基于树的机器学习(ML)算法与生物信息学技术相结合开发准确的诊断模型。
使用单变量和多变量分析,如折叠变化、接收者操作特征曲线(ROC)和偏最小二乘判别分析(PLS-DA),以确定在 T2D 患者中显示出显著浓度的生物标志物代谢物。使用三种基于树的算法[极端梯度提升(XGBoost)、轻梯度提升机(LightGBM)和自适应提升(AdaBoost)]来创建 T2D 的诊断模型,这些算法在高维数据分析中表现出稳健性。
通过三种不同方法验证的生物标志物发现过程,丙酮酸、D-鼠李糖、AMP、哌啶酸、十四烯酸、十四烷酸、十二烷二硫酸、前列腺素 E3/D3(等重)、ADP 和十六烯酸被确定为 T2D 的潜在生物标志物。我们的结果表明,XGBoost 模型[准确性=0.831、F1 得分=0.845、灵敏度=0.882、特异性=0.774、阳性预测值(PPV)=0.811、阴性预测值(NPV)=0.857 和 ROC 曲线下面积(AUC)=0.887]具有稍高的性能指标。
与生物信息学技术相结合的 ML 提供了准确且积极的 T2D 候选生物标志物发现。XGBoost 模型可以成功基于代谢物区分 T2D。