Suppr超能文献

基于代谢组学的早产预测中机器学习技术的比较分析

Comparative analysis of machine learning techniques in metabolomic-based preterm birth prediction.

作者信息

Han Ying-Chieh, Shearer Jane, Mu Chunlong, Slater Donna M, Tough Suzanne C, Duggan Gavin E

机构信息

Department of Biomedical Engineering, Faculty of Engineering, University of Calgary, 2500 University Dr. NW, Calgary, AB T2N 1N4, Canada.

Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, 3330 Hospital Drive, NW, Calgary, AB T2N 4N1, Canada.

出版信息

Comput Struct Biotechnol J. 2025 Jul 13;27:3240-3250. doi: 10.1016/j.csbj.2025.07.010. eCollection 2025.

Abstract

BACKGROUND

Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation.

METHODS

Samples from 48 preterm and 102 term delivery mothers from the All Our Families Cohort (Calgary, AB) were examined. Four ML algorithms: Partial Least Squares Discriminant Analysis (PLS-DA), linear logistic regression, artificial neural networks (ANN), Extreme Gradient Boosting (XGBoost) - with and without bootstrap resampling were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation.

RESULTS

Model performance was evaluated based on confusion matrices, area under the receiver operating characteristic (AUROC) curve analysis, and feature importance rankings. Linear models such as PLS-DA and logistic regression demonstrated moderate classification performance (AUROC ≈ 0.60), whereas non-linear approaches, including ANN and XGBoost, exhibited marginal improvements. Among all models, XGBoost combined with bootstrap resampling achieved the highest performance, yielding an AUROC of 0.85 (95 % CI: 0.57-0.99, p < 0.001), indicating a significant improvement in classification accuracy. Metabolite importance, derived from Shapley Additive Explanations (SHAP), consistently identified acylcarnitines and amino acid derivatives as principal discriminative features. Pathway analysis revealed disruptions to tyrosine metabolism as well as phenylalanine, tyrosine and tryptophan biosynthesis to be associated with preterm delivery.

CONCLUSIONS

Our results highlight the complexity of metabolomics-based modelling for preterm birth and support an iterative, model-driven approach for optimizing predictive accuracy in small-scale clinical datasets.

摘要

背景

随着算法和计算技术的进步,机器学习(ML)在生命科学研究中的应用越来越广泛。本研究使用妊娠晚期收集的血清中的非靶向代谢组学数据,调查了几种ML模型预测早产的效能。

方法

对来自“我们所有家庭队列”(卡尔加里,艾伯塔省)的48例早产母亲和102例足月分娩母亲的样本进行了检测。使用四种ML算法:偏最小二乘判别分析(PLS-DA)、线性逻辑回归、人工神经网络(ANN)、极端梯度提升(XGBoost)——有无自助重采样,来检验小规模临床数据集的模型性能和代谢物解释。

结果

基于混淆矩阵、受试者操作特征曲线下面积(AUROC)分析和特征重要性排名对模型性能进行评估。PLS-DA和逻辑回归等线性模型表现出中等分类性能(AUROC≈0.60),而包括ANN和XGBoost在内的非线性方法表现出略有改进。在所有模型中,结合自助重采样的XGBoost性能最高,AUROC为0.85(95%CI:0.57-0.99,p<0.001),表明分类准确性有显著提高。源自夏普利加性解释(SHAP)的代谢物重要性一致将酰基肉碱和氨基酸衍生物确定为主要判别特征。通路分析显示酪氨酸代谢以及苯丙氨酸、酪氨酸和色氨酸生物合成的破坏与早产有关。

结论

我们的结果突出了基于代谢组学的早产建模的复杂性,并支持采用迭代的、模型驱动的方法来优化小规模临床数据集中的预测准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d85f/12312043/570bdc202590/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验