Suppr超能文献

利用尼日利亚人群中的风险因素预测糖尿病的三种数据模型的开发与比较

Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population.

作者信息

Odukoya Oluwakemi, Nwaneri Solomon, Odeniyi Ifedayo, Akodu Babatunde, Oluwole Esther, Olorunfemi Gbenga, Popoola Oluwatoyin, Osuntoki Akinniyi

机构信息

Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria.

Department of Biomedical Engineering, College of Medicine, University of Lagos, Lagos State, Nigeria.

出版信息

Healthc Inform Res. 2022 Jan;28(1):58-67. doi: 10.4258/hir.2022.28.1.58. Epub 2022 Jan 31.

Abstract

OBJECTIVE

This study developed and compared the performance of three widely used predictive models-logistic regression (LR), artificial neural network (ANN), and decision tree (DT)-to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians.

METHODS

We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1.

RESULTS

The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892).

CONCLUSIONS

Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.

摘要

目的

本研究开发并比较了三种广泛使用的预测模型——逻辑回归(LR)、人工神经网络(ANN)和决策树(DT),以利用尼日利亚人群的社会人口统计学、生活方式和身体属性来预测糖尿病。

方法

我们使用10个输入变量开发了三种预测模型。数据预处理步骤包括去除缺失值和异常值、最小-最大归一化以及使用主成分分析进行特征提取。数据训练和验证使用10折交叉验证完成。准确性、敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)以及受试者工作特征曲线下面积(AUROC)用作性能评估指标。分析和模型开发在R 3.6.1版本中进行。

结果

参与者的平均年龄为50.52±16.14岁。LR的分类准确性、敏感性、特异性、PPV和NPV分别为81.31%、84.32%、77.24%、72.75%和82.49%。ANN的分别为98.64%、98.37%、99.00%、98.61%和98.83%,DT的分别为99.05%、99.76%、98.08%、98.77%和99.82%。表现最佳和最差的分类器分别是DT和LR,准确率分别为99.05%和81.31%。同样,与ANN(0.976)和LR(0.892)相比,DT算法获得了最佳的AUC值(0.992)。

结论

我们的研究表明,基于某些风险因素,DT、LR和ANN模型可有效用于预测尼日利亚人群中的糖尿病。对这些模型的总体比较分析表明,DT模型的表现优于LR和ANN。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/763f/8850175/63c1dbc40001/hir-2022-28-1-58f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验