Department of Respiratory and Critical Care Medicine, The Affiliated Hospital of Qingdao University, Qingdao, China.
School of Nursing, Qingdao University, Qingdao, China.
BMC Pregnancy Childbirth. 2021 Dec 8;21(1):814. doi: 10.1186/s12884-021-04295-2.
Gestational diabetes mellitus (GDM) is one of the critical causes of adverse perinatal outcomes. A reliable estimate of GDM in early pregnancy would facilitate intervention plans for maternal and infant health care to prevent the risk of adverse perinatal outcomes. This study aims to build an early model to predict GDM in the first trimester for the primary health care centre.
Characteristics of pregnant women in the first trimester were collected from eastern China from 2017 to 2019. The univariate analysis was performed using SPSS 23.0 statistical software. Characteristics comparison was applied with Mann-Whitney U test for continuous variables and chi-square test for categorical variables. All analyses were two-sided with p < 0.05 indicating statistical significance. The train_test_split function in Python was used to split the data set into 70% for training and 30% for test. The Random Forest model and Logistic Regression model in Python were applied to model the training data set. The 10-fold cross-validation was used to assess the model's performance by the areas under the ROC Curve, diagnostic accuracy, sensitivity, and specificity.
A total of 1,139 pregnant women (186 with GDM) were included in the final data analysis. Significant differences were observed in age (Z=-2.693, p=0.007), pre-pregnancy BMI (Z=-5.502, p<0.001), abdomen circumference in the first trimester (Z=-6.069, p<0.001), gravidity (Z=-3.210, p=0.001), PCOS (χ=101.024, p<0.001), irregular menstruation (χ=6.578, p=0.010), and family history of diabetes (χ=15.266, p<0.001) between participants with GDM or without GDM. The Random Forest model achieved a higher AUC than the Logistic Regression model (0.777±0.034 vs 0.755±0.032), and had a better discrimination ability of GDM from Non-GDMs (Sensitivity: 0.651±0.087 vs 0.683±0.084, Specificity: 0.813±0.075 vs 0.736±0.087).
This research developed a simple model to predict the risk of GDM using machine learning algorithm based on pre-pregnancy BMI, abdomen circumference in the first trimester, age, PCOS, gravidity, irregular menstruation, and family history of diabetes. The model was easy in operation, and all predictors were easily obtained in the first trimester in primary health care centres.
妊娠期糖尿病(GDM)是不良围产结局的重要原因之一。早期妊娠时对 GDM 进行可靠的估计,将有助于制定母婴保健干预计划,以预防不良围产结局的风险。本研究旨在为基层医疗中心建立预测早期妊娠 GDM 的模型。
收集 2017 年至 2019 年期间中国东部地区的初孕妇特征。采用 SPSS 23.0 统计软件进行单因素分析。连续变量采用 Mann-Whitney U 检验,分类变量采用卡方检验进行特征比较。所有分析均为双侧检验,p<0.05 表示有统计学意义。采用 Python 中的 train_test_split 函数将数据集分为 70%用于训练和 30%用于测试。采用 Python 中的随机森林模型和逻辑回归模型对训练数据集进行建模。使用 10 折交叉验证通过 ROC 曲线下面积、诊断准确性、敏感性和特异性来评估模型性能。
最终数据纳入 1139 名孕妇(186 名患有 GDM)进行分析。患有 GDM 组与非 GDM 组在年龄(Z=-2.693,p=0.007)、孕前 BMI(Z=-5.502,p<0.001)、孕早期腹围(Z=-6.069,p<0.001)、孕次(Z=-3.210,p=0.001)、多囊卵巢综合征(PCOS)(χ=101.024,p<0.001)、月经不规律(χ=6.578,p=0.010)和糖尿病家族史(χ=15.266,p<0.001)方面存在显著差异。随机森林模型的 AUC 高于逻辑回归模型(0.777±0.034 比 0.755±0.032),对 GDM 和非 GDM 的区分能力更强(敏感性:0.651±0.087 比 0.683±0.084,特异性:0.813±0.075 比 0.736±0.087)。
本研究采用机器学习算法,基于孕前 BMI、孕早期腹围、年龄、PCOS、孕次、月经不规律和糖尿病家族史,建立了一种简单的预测 GDM 风险的模型。该模型易于操作,且所有预测因子均可在基层医疗中心的早期妊娠中获得。