Jiang Liangjun, Yang Zerui, Liu Gang, Xia Zhenhua, Yang Guangyao, Gong Haimei, Wang Jing, Wang Lei
College of Information and Communication Engineering, State Key Lab of Marine Resource Utilization in South China Sea, Hainan University, Haikou, China.
School of Electronics and Information, Yangtze University, Jingzhou, China.
Front Public Health. 2024 Feb 23;12:1328353. doi: 10.3389/fpubh.2024.1328353. eCollection 2024.
The prevalence of diabetes, a common chronic disease, has shown a gradual increase, posing substantial burdens on both society and individuals. In order to enhance the effectiveness of diabetes risk prediction questionnaires, optimize the selection of characteristic variables, and raise awareness of diabetes risk among residents, this study utilizes survey data obtained from the risk factor monitoring system of the Centers for Disease Control and Prevention in the United States.
Following univariate analysis and meticulous screening, a more refined dataset was constructed. This dataset underwent preprocessing steps, including data distribution standardization, the application of the Synthetic Minority Oversampling Technique (SMOTE) in combination with the Round function for equilibration, and data standardization. Subsequently, machine learning (ML) techniques were employed, utilizing enumerated feature variables to evaluate the strength of the correlation among diabetes risk factors.
The research findings effectively delineated the ranking of characteristic variables that significantly influence the risk of diabetes. Obesity emerges as the most impactful factor, overshadowing other risk factors. Additionally, psychological factors, advanced age, high cholesterol, high blood pressure, alcohol abuse, coronary heart disease or myocardial infarction, mobility difficulties, and low family income exhibit correlations with diabetes risk to varying degrees.
The experimental data in this study illustrate that, while maintaining comparable accuracy, optimization of questionnaire variables and the number of questions can significantly enhance efficiency for subsequent follow-up and precise diabetes prevention. Moreover, the research methods employed in this study offer valuable insights into studying the risk correlation of other diseases, while the research results contribute to heightened societal awareness of populations at elevated risk of diabetes.
糖尿病作为一种常见的慢性病,其患病率呈逐渐上升趋势,给社会和个人都带来了沉重负担。为提高糖尿病风险预测问卷的有效性,优化特征变量选择,并提高居民对糖尿病风险的认识,本研究利用了美国疾病控制与预防中心危险因素监测系统获得的调查数据。
经过单变量分析和细致筛选,构建了一个更精细的数据集。该数据集进行了预处理步骤,包括数据分布标准化、结合舍入函数应用合成少数过采样技术(SMOTE)进行均衡以及数据标准化。随后,采用机器学习(ML)技术,利用枚举特征变量评估糖尿病危险因素之间的相关强度。
研究结果有效划定了对糖尿病风险有显著影响的特征变量的排名。肥胖成为最具影响力的因素,超过其他危险因素。此外,心理因素、高龄、高胆固醇、高血压、酗酒、冠心病或心肌梗死、行动困难以及家庭收入低等因素与糖尿病风险存在不同程度的相关性。
本研究中的实验数据表明,在保持相当准确性的同时,优化问卷变量和问题数量可以显著提高后续随访和精准糖尿病预防的效率。此外,本研究采用的研究方法为研究其他疾病的风险相关性提供了有价值的见解,而研究结果有助于提高社会对糖尿病高风险人群的认识。