Suppr超能文献

在一个种族多样化人群中,基于机器学习和传统逻辑回归的妊娠期糖尿病预测模型的比较;莫纳什妊娠期糖尿病机器学习模型

Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model.

作者信息

Belsti Yitayeh, Moran Lisa, Du Lan, Mousa Aya, De Silva Kushan, Enticott Joanne, Teede Helena

机构信息

Monash Centre for Health Research and Implementation (MCHRI), Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia; University of Gondar, College of Medicine and Health Science, Ethiopia.

Monash Centre for Health Research and Implementation (MCHRI), Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Australia.

出版信息

Int J Med Inform. 2023 Nov;179:105228. doi: 10.1016/j.ijmedinf.2023.105228. Epub 2023 Sep 21.

Abstract

BACKGROUND

Early identification of pregnant women at high risk of developing gestational diabetes (GDM) is desirable as effective lifestyle interventions are available to prevent GDM and to reduce associated adverse outcomes. Personalised probability of developing GDM during pregnancy can be determined using a risk prediction model. These models extend from traditional statistics to machine learning methods; however, accuracy remains sub-optimal.

OBJECTIVE

We aimed to compare multiple machine learning algorithms to develop GDM risk prediction models, then to determine the optimal model for predicting GDM.

METHODS

A supervised machine learning predictive analysis was performed on data from routine antenatal care at a large health service network from January 2016 to June 2021. Predictor set 1 were sourced from the existing, internationally validated Monash GDM model: GDM history, body mass index, ethnicity, age, family history of diabetes, and past poor obstetric history. New models with different predictors were developed, considering statistical principles with inclusion of more robust continuous and derivative variables. A randomly selected 80% dataset was used for model development, with 20% for validation. Performance measures, including calibration and discrimination metrics, were assessed. Decision curve analysis was performed.

RESULTS

Upon internal validation, the machine learning and logistic regression model's area under the curve (AUC) ranged from 71% to 93% across the different algorithms, with the best being the CatBoost Classifier (CBC). Based on the default cut-off point of 0.32, the performance of CBC on predictor set 4 was: Accuracy (85%), Precision (90%), Recall (78%), F1-score (84%), Sensitivity (81%), Specificity (90%), positive predictive value (92%), negative predictive value (78%), and Brier Score (0.39).

CONCLUSIONS

In this study, machine learning approaches achieved the best predictive performance over traditional statistical methods, increasing from 75 to 93%. The CatBoost classifier method achieved the best with the model including continuous variables.

摘要

背景

尽早识别有患妊娠期糖尿病(GDM)高风险的孕妇是很有必要的,因为可以通过有效的生活方式干预来预防GDM并减少相关不良后果。使用风险预测模型可以确定孕期发生GDM的个性化概率。这些模型从传统统计方法扩展到机器学习方法;然而,准确性仍然不尽人意。

目的

我们旨在比较多种机器学习算法以开发GDM风险预测模型,然后确定预测GDM的最佳模型。

方法

对2016年1月至2021年6月期间在一个大型医疗服务网络进行的常规产前检查数据进行监督式机器学习预测分析。预测指标集1源自现有的、经过国际验证的莫纳什GDM模型:GDM病史、体重指数、种族、年龄、糖尿病家族史和既往不良产科史。考虑到统计原则并纳入更稳健的连续变量和派生变量,开发了具有不同预测指标的新模型。随机选取80%的数据集用于模型开发,20%用于验证。评估了包括校准和区分指标在内的性能指标。进行了决策曲线分析。

结果

内部验证时,不同算法下机器学习和逻辑回归模型的曲线下面积(AUC)在71%至93%之间,最佳的是CatBoost分类器(CBC)。基于默认的0.32截止点,CBC在预测指标集4上的性能为:准确率(85%)、精确率(90%)、召回率(78%)、F1分数(84%)、灵敏度(81%)、特异度(90%)、阳性预测值(92%)、阴性预测值(78%)和布里尔评分(0.39)。

结论

在本研究中,机器学习方法比传统统计方法具有最佳的预测性能,从75%提高到了93%。CatBoost分类器方法在包含连续变量的模型中表现最佳。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验