Esmaily Habibollah, Tayefi Maryam, Doosti Hassan, Ghayour-Mobarhan Majid, Nezami Hossein, Amirabadizadeh Alireza
Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
Clinical Research Unit, Mashhad university of Medical Sciences, Mashhad, Iran.
J Res Health Sci. 2018 Apr 24;18(2):e00412.
BACKGROUND: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. STUDY DESIGN: A cross-sectional study. METHODS: The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. RESULTS: The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. CONCLUSIONS: The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
背景:我们旨在利用数据挖掘方法、决策树和随机森林技术,通过马什哈德中风与心脏动脉粥样硬化疾病(MASHAD)研究项目,确定2型糖尿病(T2DM)的相关危险因素。 研究设计:一项横断面研究。 方法:MASHAD研究始于2010年,将持续至2020年。当从MASHAD数据库招募的9528名受试者出现一些其他特征时,使用两种数据挖掘工具,即决策树和随机森林,来预测T2DM。本文在准确性、敏感性、特异性和ROC曲线下面积方面对这两种模型进行了比较。 结果:这些受试者中T2DM的患病率为14%。决策树模型的准确率为64.9%,敏感性为64.5%,特异性为66.8%,ROC曲线下面积为68.6%,而随机森林模型的准确率分别为71.1%,敏感性为71.3%,特异性为69.9%,ROC曲线下面积为77.3%。 结论:随机森林模型与人口统计学、临床、人体测量学和生化测量数据一起使用时,可以提供一个简单的工具来识别2型糖尿病的相关危险因素。这种识别对于制定健康政策以减少T2DM患者数量具有重要意义。
Glob J Health Sci. 2015-3-18
Diabetes Res Clin Pract. 2014-7-18
Kaohsiung J Med Sci. 2012-10-16
Comput Methods Programs Biomed. 2017-2
Front Endocrinol (Lausanne). 2020
Front Endocrinol (Lausanne). 2021
Commun Med (Lond). 2025-4-22
Diabetol Metab Syndr. 2025-1-22
BMC Public Health. 2024-12-28
BMC Musculoskelet Disord. 2024-11-18
Diabetol Int. 2024-4-16
Curr Res Food Sci. 2024-4-20