Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China.
University of Texas Health Science Center at Houston, Houston, Texas, USA.
BMC Med Inform Decis Mak. 2019 Mar 12;19(1):41. doi: 10.1186/s12911-019-0790-3.
Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes.
In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January-April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification.
The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke.
Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.
对于糖尿病高危人群,预测或早期诊断糖尿病至关重要。
本研究评估了 5 种流行分类器(J48、AdaboostM1、SMO、贝叶斯网络和朴素贝叶斯)根据 9 种非侵入性和易于获得的临床特征(包括年龄、性别、体重指数(BMI)、高血压、心血管疾病或中风病史、糖尿病家族史、身体活动、工作压力和喜欢吃咸食)识别糖尿病患者的能力。2017 年 1 月至 4 月,我们从中国医科大学盛京医院成人年度体检报告中获得了 4205 条数据记录。使用 WEKA 数据挖掘软件确定糖尿病分类的最佳算法。
结果表明,决策树分类器 J48 的性能最佳(准确率=0.9503、精度=0.950、召回率=0.950、F1 度量=0.948、AUC=0.964)。决策树结构显示,年龄是最重要的特征,其次是糖尿病家族史、工作压力、BMI、喜欢吃咸食、身体活动、高血压、性别和心血管疾病或中风病史。
我们的研究表明,决策树分析可用于筛查有早期糖尿病风险的个体,而无需进行侵入性检查。这种方法将特别适用于具有高流行病学风险和较差社会经济地位的发展中地区,并使临床医生能够快速筛查出糖尿病风险增加的患者。树结构中的关键特征可以通过有针对性的社区干预措施进一步促进糖尿病预防,这可能有助于提高早期糖尿病诊断并减轻医疗保健系统的负担。