Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China.
The Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai, China.
J Diabetes Res. 2020 Jun 12;2020:4168340. doi: 10.1155/2020/4168340. eCollection 2020.
Gestational diabetes mellitus (GDM) contributes to adverse pregnancy and birth outcomes. In recent decades, extensive research has been devoted to the early prediction of GDM by various methods. Machine learning methods are flexible prediction algorithms with potential advantages over conventional regression.
The purpose of this study was to use machine learning methods to predict GDM and compare their performance with that of logistic regressions.
We performed a retrospective, observational study including women who attended their routine first hospital visits during early pregnancy and had Down's syndrome screening at 16-20 gestational weeks in a tertiary maternity hospital in China from 2013.1.1 to 2017.12.31. A total of 22,242 singleton pregnancies were included, and 3182 (14.31%) women developed GDM. Candidate predictors included maternal demographic characteristics and medical history (maternal factors) and laboratory values at early pregnancy. The models were derived from the first 70% of the data and then validated with the next 30%. Variables were trained in different machine learning models and traditional logistic regression models. Eight common machine learning methods (GDBT, AdaBoost, LGB, Logistic, Vote, XGB, Decision Tree, and Random Forest) and two common regressions (stepwise logistic regression and logistic regression with RCS) were implemented to predict the occurrence of GDM. Models were compared on discrimination and calibration metrics.
In the validation dataset, the machine learning and logistic regression models performed moderately (AUC 0.59-0.74). Overall, the GBDT model performed best (AUC 0.74, 95% CI 0.71-0.76) among the machine learning methods, with negligible differences between them. Fasting blood glucose, HbA1c, triglycerides, and BMI strongly contributed to GDM. A cutoff point for the predictive value at 0.3 in the GBDT model had a negative predictive value of 74.1% (95% CI 69.5%-78.2%) and a sensitivity of 90% (95% CI 88.0%-91.7%), and the cutoff point at 0.7 had a positive predictive value of 93.2% (95% CI 88.2%-96.1%) and a specificity of 99% (95% CI 98.2%-99.4%).
In this study, we found that several machine learning methods did not outperform logistic regression in predicting GDM. We developed a model with cutoff points for risk stratification of GDM.
妊娠糖尿病(GDM)会导致不良的妊娠和分娩结局。近几十年来,人们已经用各种方法致力于 GDM 的早期预测。机器学习方法是一种灵活的预测算法,相对于传统回归方法具有潜在优势。
本研究旨在使用机器学习方法预测 GDM,并比较其与逻辑回归的性能。
我们进行了一项回顾性、观察性研究,纳入了 2013 年 1 月 1 日至 2017 年 12 月 31 日在中国一家三级妇产医院接受常规首次产前检查且在 16-20 孕周进行唐氏综合征筛查的单胎妊娠女性。共纳入 22242 例单胎妊娠,其中 3182 例(14.31%)女性发生 GDM。候选预测因素包括母亲的人口统计学特征和病史(母亲因素)以及孕早期的实验室值。模型基于数据的前 70%得出,然后使用后 30%进行验证。在不同的机器学习模型和传统逻辑回归模型中对变量进行训练。实施了八种常见的机器学习方法(GDBT、AdaBoost、LGB、Logistic、Vote、XGB、Decision Tree 和 Random Forest)和两种常见的回归方法(逐步逻辑回归和 RCS 逻辑回归)来预测 GDM 的发生。通过判别和校准指标比较模型。
在验证数据集中,机器学习和逻辑回归模型的表现中等(AUC 0.59-0.74)。总体而言,在机器学习方法中,GBDT 模型表现最佳(AUC 0.74,95%CI 0.71-0.76),但彼此之间差异不大。空腹血糖、HbA1c、甘油三酯和 BMI 对 GDM 有重要影响。GBDT 模型中预测值为 0.3 的截断点的阴性预测值为 74.1%(95%CI 69.5%-78.2%),灵敏度为 90%(95%CI 88.0%-91.7%),预测值为 0.7 的截断点的阳性预测值为 93.2%(95%CI 88.2%-96.1%),特异性为 99%(95%CI 98.2%-99.4%)。
本研究发现,在预测 GDM 方面,几种机器学习方法并未优于逻辑回归。我们开发了一种具有 GDM 风险分层截断点的模型。