Xue Yicheng, Chen Silong, Zhang Mengmeng, Cai Xiaojuan, Zheng Jialian, Wang Shihua, Chen Yan
The Medical School of Jiaxing University, Jiahang Road, Jiaxing, China.
The Medical Examination Center of the First Hospital of Jiaxing, Jiaxing, China.
Iran J Public Health. 2022 May;51(5):999-1009. doi: 10.18502/ijph.v51i5.9415.
We aimed to investigate the high-risk factors of stroke through logistic regressive analysis and using LightGBM algorithm separately. The results of the two models were compared for instructing the prevention of stroke.
Samples of residents older than 40 years of age were collected from two medical examination centers in Jiaxing, China from 2018 to 2019. Among the total 2124 subjects, 1059 subjects were middle-aged people (40-59 years old) and 1065 subjects were elder-aged people (≥60 years old). Their demographic characteristics, medical history, family history, eating habits etc. were recorded and separately input into logistic regressive analysis and LightGBM algorithm to build the prediction models of high-risk population of stroke. Four values including F1 score, accuracy, recall rate and AUROC were compared between the two models.
The risk factors of stroke were positively correlated with age, while negatively correlated with the frequency of fruit consumption and taste preference. People with low-salt diet were associated with less risk of stroke than those with high-salt diet, and male had higher stroke risk than female. Meanwhile, the risk factors were positively correlated with the frequency of alcohol consumption in the middle-aged group, and negatively correlated with the education level in the elder-aged group. Furthermore, the four values from LightGBM were higher than those from logistic regression, except for the recall value of the middle-aged group.
Age, gender, family history of hypertension and diabetes, the frequency of fruit consumption, alcohol and dairy products, taste preference, and education level could as the risk predictive factors of stroke. The Model of using LightGBM algorithm is more accurate than that using logistic regressive analysis.
我们旨在分别通过逻辑回归分析和使用LightGBM算法来研究中风的高危因素。比较这两种模型的结果以指导中风的预防。
2018年至2019年从中国嘉兴的两个体检中心收集40岁以上居民的样本。在总共2124名受试者中,1059名是中年人(40 - 59岁),1065名是老年人(≥60岁)。记录他们的人口统计学特征、病史、家族史、饮食习惯等,并分别输入逻辑回归分析和LightGBM算法中,以建立中风高危人群的预测模型。比较两种模型之间的F1分数、准确率、召回率和AUROC这四个值。
中风的危险因素与年龄呈正相关,而与水果消费频率和口味偏好呈负相关。低盐饮食的人比高盐饮食的人中风风险更低,男性的中风风险高于女性。同时,中年组中危险因素与饮酒频率呈正相关,老年组中与教育水平呈负相关。此外,除中年组的召回值外,LightGBM的四个值均高于逻辑回归的值。
年龄、性别、高血压和糖尿病家族史、水果消费频率、酒精和乳制品、口味偏好以及教育水平可作为中风的风险预测因素。使用LightGBM算法的模型比使用逻辑回归分析的模型更准确。