使用决策树和线性回归预测大样本人群中的高敏C反应蛋白水平及其关联。
Predicting high sensitivity C-reactive protein levels and their associations in a large population using decision tree and linear regression.
作者信息
Ghiasi Hafezi Somayeh, Sahranavard Toktam, Kooshki Alireza, Hosseini Marzieh, Mansoori Amin, Fakhrian Elham Amir, Rezaeifard Helia, Ghamsary Mark, Esmaily Habibollah, Ghayour-Mobarhan Majid
机构信息
Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
Department of Applied Mathematics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran.
出版信息
Sci Rep. 2024 Dec 5;14(1):30298. doi: 10.1038/s41598-024-81714-2.
High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regression (LR) modeling. This study was conducted on a population of 9704 males and females aged 35 to 65 years recruited from the Mashhad Stroke and Heart Atherosclerotic Disorder (MASHAD) cohort study. We utilized a data mining approach to construct a predictive model of hs-CRP measurements, employing the DT methodology. DT model was used to predict hs-CRP level using biochemical factors and clinical features. A total of 9,704 individuals were included in the analysis, with 57% of them being female. Except for fasting blood glucose (FBG), hypertension (HTN), and Type 2 diabetes mellites (T2DM), all variables showed significant differences between the two groups. The results of the LR models showed that variables such as anxiety score, depression score, Systolic Blood Pressure, Cardiovascular disease, and HTN were significant in predicting hs-CRP levels. In the DT models, depression score, FBG, cholesterol, and anxiety score were identified as the most important factors in predicting hs-CRP levels. DT model was able to predict hs-CRP level with an accuracy of 72.1% in training and 71.4% in testing of both genders. The proposed DT model appears to be able to predict the hs-CRP levels based on anxiety score, depression scores, fasting blood glucose, systolic blood pressure, and history of cardiovascular diseases.
高敏C反应蛋白(hs-CRP)是一种炎症生物标志物,可预测不同健康病理状况的发生率。在本研究中,我们旨在使用决策树(DT)和线性回归(LR)模型评估血液学和人口统计学因素与hs-CRP水平之间的关联。本研究对从马什哈德中风和心脏动脉粥样硬化疾病(MASHAD)队列研究中招募的9704名年龄在35至65岁之间的男性和女性进行。我们采用数据挖掘方法,运用DT方法构建hs-CRP测量的预测模型。DT模型用于使用生化因素和临床特征预测hs-CRP水平。共有9704人纳入分析,其中57%为女性。除空腹血糖(FBG)、高血压(HTN)和2型糖尿病(T2DM)外,所有变量在两组之间均显示出显著差异。LR模型的结果表明,焦虑评分、抑郁评分、收缩压、心血管疾病和HTN等变量在预测hs-CRP水平方面具有显著性。在DT模型中,抑郁评分、FBG、胆固醇和焦虑评分被确定为预测hs-CRP水平的最重要因素。DT模型在训练中预测hs-CRP水平的准确率为72.1%,在男女测试中为71.4%。所提出的DT模型似乎能够基于焦虑评分、抑郁评分、空腹血糖、收缩压和心血管疾病史预测hs-CRP水平。