Suppr超能文献

伊朗糖尿病预测中数据挖掘方法的真实数据比较

Real-data comparison of data mining methods in prediction of diabetes in iran.

作者信息

Tapak Lily, Mahjub Hossein, Hamidi Omid, Poorolajal Jalal

机构信息

Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.

出版信息

Healthc Inform Res. 2013 Sep;19(3):177-85. doi: 10.4258/hir.2013.19.3.177. Epub 2013 Sep 30.

Abstract

OBJECTIVES

Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes.

METHODS

The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria.

RESULTS

Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350).

CONCLUSIONS

The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.

摘要

目的

糖尿病是发展中国家最常见的非传染性疾病之一。早期筛查和诊断在有效的预防策略中起着重要作用。本研究比较了两种传统分类方法(逻辑回归和费舍尔线性判别分析)以及四种机器学习分类器(神经网络、支持向量机、模糊c均值和随机森林)对糖尿病患者和非糖尿病患者进行分类的效果。

方法

本研究使用的数据集包括通过横断面调查获得的来自伊朗全国非传染性疾病风险因素监测的6500名受试者。所获得的样本基于2005 - 2009年对伊朗人群进行的整群抽样,以评估主要非传染性疾病风险因素的患病率。选择了十个通常与糖尿病相关的风险因素,以比较六种分类器在敏感性、特异性、总准确率和受试者操作特征(ROC)曲线下面积标准方面的性能。

结果

支持向量机显示出最高的总准确率(0.986)以及ROC曲线下面积(0.979)。此外,该方法还显示出高特异性(1.000)和敏感性(0.820)。所有其他方法的总准确率均超过85%,但对于所有方法,敏感性值都非常低(小于0.350)。

结论

本研究结果表明,在敏感性、特异性和总体分类准确性方面,支持向量机模型在所有测试的糖尿病预测分类器中排名第一。因此,这种方法是一种有前景的糖尿病预测分类器,应进一步研究其对其他疾病的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e7e/3810525/57c4be98dcd6/hir-19-177-g001.jpg

相似文献

1
Real-data comparison of data mining methods in prediction of diabetes in iran.
Healthc Inform Res. 2013 Sep;19(3):177-85. doi: 10.4258/hir.2013.19.3.177. Epub 2013 Sep 30.
3
4
Comparing different algorithms for the course of Alzheimer's disease using machine learning.
Ann Palliat Med. 2021 Sep;10(9):9715-9724. doi: 10.21037/apm-21-2013.
5
Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms.
Comput Methods Programs Biomed. 2019 Jul;176:173-193. doi: 10.1016/j.cmpb.2019.04.008. Epub 2019 Apr 10.
9
Performance of machine learning methods in diagnosing Parkinson's disease based on dysphonia measures.
Biomed Eng Lett. 2017 Oct 12;8(1):29-39. doi: 10.1007/s13534-017-0051-2. eCollection 2018 Feb.

引用本文的文献

1
Validating Machine Learning Models Against the Saline Test Gold Standard for Primary Aldosteronism Diagnosis.
JACC Asia. 2024 Nov 12;4(12):972-984. doi: 10.1016/j.jacasi.2024.09.010. eCollection 2024 Dec.
2
Artificial intelligence in diabetes management: Advancements, opportunities, and challenges.
Cell Rep Med. 2023 Oct 17;4(10):101213. doi: 10.1016/j.xcrm.2023.101213. Epub 2023 Oct 2.
3
A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain.
Healthcare (Basel). 2023 Apr 28;11(9):1268. doi: 10.3390/healthcare11091268.
4
Hyperglycemia screening based on survey data: an international instrument based on WHO STEPs dataset.
BMC Endocr Disord. 2022 Dec 14;22(1):316. doi: 10.1186/s12902-022-01222-0.
6
Analysis of risk factors associated with acute respiratory infections among under-five children in Uganda.
BMC Public Health. 2022 Jun 17;22(1):1209. doi: 10.1186/s12889-022-13532-y.
8
Factors Associated with In Vitro Fertilization Live Birth Outcome: A Comparison of Different Classification Methods.
Int J Fertil Steril. 2021 Apr;15(2):128-134. doi: 10.22074/IJFS.2020.134582. Epub 2021 Mar 11.
10
Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease.
Hypertension. 2020 Nov;76(5):1555-1562. doi: 10.1161/HYPERTENSIONAHA.120.15885. Epub 2020 Sep 10.

本文引用的文献

1
Use of data mining techniques to determine and predict length of stay of cardiac patients.
Healthc Inform Res. 2013 Jun;19(2):121-9. doi: 10.4258/hir.2013.19.2.121. Epub 2013 Jun 30.
2
Predictors of medication adherence in elderly patients with chronic diseases using support vector machine models.
Healthc Inform Res. 2013 Mar;19(1):33-41. doi: 10.4258/hir.2013.19.1.33. Epub 2013 Mar 31.
3
Five-year Evaluation of Chronic Diseases in Hamadan, Iran: 2005-2009.
Iran J Public Health. 2012;41(3):71-81. Epub 2012 Mar 31.
4
A Comparison of Intensive Care Unit Mortality Prediction Models through the Use of Data Mining Techniques.
Healthc Inform Res. 2011 Dec;17(4):232-43. doi: 10.4258/hir.2011.17.4.232. Epub 2011 Dec 31.
6
Application of support vector machine for prediction of medication adherence in heart failure patients.
Healthc Inform Res. 2010 Dec;16(4):253-9. doi: 10.4258/hir.2010.16.4.253. Epub 2010 Dec 31.
7
Novel application of a statistical technique, Random Forests, in a bacterial source tracking study.
Water Res. 2010 Jul;44(14):4067-76. doi: 10.1016/j.watres.2010.05.019. Epub 2010 May 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验