Ding Tianze, Liu Peijie, Jia Jie, Wu Hui, Zhu Jie, Yang Kefeng
Department of Clinical Nutrition, Xin Hua Hospital Affiliated to School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
Department of Clinical Nutrition, College of Heath Science and Technology, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
Endocr Connect. 2024 Nov 21;13(12). doi: 10.1530/EC-24-0169. Print 2024 Dec 1.
Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects.
A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer-Lemeshow test.
Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM).
XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.
妊娠期糖尿病(GDM)会显著影响妊娠结局。因此,开发预测模型至关重要,因为它们可以指导及时干预,以降低GDM的发生率及其相关不良影响。
共选取554名孕妇,收集她们的社会人口学特征、临床数据和饮食数据。饮食数据通过经过验证的半定量食物频率问卷(FFQ)进行调查。我们应用随机森林平均减少杂质进行特征选择,并使用逻辑回归、XGBoost和LightGBM算法构建模型。通过准确性、敏感性、特异性、曲线下面积(AUC)和Hosmer-Lemeshow检验比较不同模型的预测性能。
根据特征选择,血糖、年龄、孕前体重指数(BMI)、甘油三酯和高密度脂蛋白胆固醇(HDL)是前五个特征。在这三种算法中,XGBoost表现最佳,AUC为0.788,LightGBM次之(AUC = 0.749),逻辑回归表现最差(AUC = 0.712)。此外,当纳入饮食信息时,XGBoost和LightGBM均取得了相当好的性能,超过了它们在非饮食数据集上的性能(XGBoost中为0.788对0.718;LightGBM中为0.749对0.726)。
在中国孕妇中,XGBoost和LightGBM算法在预测GDM方面优于逻辑回归。此外,饮食数据可能对提高模型性能有积极作用,值得进行更大样本量的更深入研究。