Gibbons Kristen S, Chang Allan M Z, Ma Ronald C W, Tam Wing Hung, Catalano Patrick M, Sacks David A, Lowe Julia, David McIntyre H
Faculty of Medicine, The University of Queensland, South Brisbane, Q 4051, Australia.
Department of Obstetrics and Gynecology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong Special Administrative Region.
Diabetes Res Clin Pract. 2021 Aug;178:108975. doi: 10.1016/j.diabres.2021.108975. Epub 2021 Jul 22.
Using data from a large multi-centre cohort, we aimed to create a risk prediction model for large-for-gestational age (LGA) infants, using both logistic regression and naïve Bayes approaches, and compare the utility of these two approaches.
We have compared the two techniques underpinning machine learning: logistic regression (LR) and naïve Bayes (NB) in terms of their ability to predict large-for-gestational age (LGA) infants. Using data from five centres involved in the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study, we developed LR and NB models and compared the predictive ability and stability between the models. Models were developed combining the risks of hyperglycaemia (assessed in three forms: IADPSG GDM yes/no, GDM subtype, OGTT z-score quintiles), demographic and clinical variables as potential predictors.
The two approaches resulted in similar estimates of LGA risk (intraclass correlation coefficient 0.955, 95% CI 0.952, 0.958) however the AUROC for the LR model was significantly higher (0.698 vs 0.682; p < 0.001). When comparing the three LR models, use of individual OGTT z-score quintiles resulted in statistically higher AUROCs than the other two models.
Logistic regression can be used with confidence to assess the relationship between clinical and biochemical variables and outcome.
利用来自一个大型多中心队列的数据,我们旨在使用逻辑回归和朴素贝叶斯方法创建一个针对大于胎龄(LGA)婴儿的风险预测模型,并比较这两种方法的效用。
我们比较了机器学习的两种基础技术:逻辑回归(LR)和朴素贝叶斯(NB)在预测大于胎龄(LGA)婴儿方面的能力。利用来自参与高血糖与不良妊娠结局(HAPO)研究的五个中心的数据,我们开发了LR和NB模型,并比较了模型之间的预测能力和稳定性。模型的开发结合了高血糖风险(以三种形式评估:国际糖尿病与妊娠研究组(IADPSG)诊断的妊娠期糖尿病(GDM)是/否、GDM亚型、口服葡萄糖耐量试验(OGTT)z评分五分位数)、人口统计学和临床变量作为潜在预测因素。
两种方法得出的LGA风险估计值相似(组内相关系数0.955,95%置信区间0.952,0.958),然而LR模型的曲线下面积(AUROC)显著更高(0.698对0.682;p<0.001)。比较三个LR模型时,使用个体OGTT z评分五分位数得出的AUROC在统计学上高于其他两个模型。
可以放心地使用逻辑回归来评估临床和生化变量与结局之间的关系。