Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
National Nutrition and Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Arch Iran Med. 2024 May 1;27(5):239-247. doi: 10.34172/aim.2024.35.
Today, cardiovascular disease (CVD) is the most important cause of death around the world. In this study, our main aim was to predict CVD using some of the most important indicators of this disease and present a tree-based statistical framework for detecting CVD patients according to these indicators.
We used data from the baseline phase of the Fasa Cohort Study (FACS). The outcome variable was the presence of CVD. The ordinary Tree and generalized linear mixed models (GLMM) were fitted to the data and their predictive power for detecting CVD was compared with the obtained results from the GLMM tree. Statistical analysis was performed using the RStudio software.
Data of 9499 participants aged 35‒70 years were analyzed. The results of the multivariable mixed-effects logistic regression model revealed that participants' age, total cholesterol, marital status, smoking status, glucose, history of cardiac disease or myocardial infarction (MI) in first- and second-degree relatives, and presence of other diseases (like hypertension, depression, chronic headaches, and thyroid disease) were significantly related to the presence of CVD (<0.05). Fitting the ordinary tree, GLMM, and GLMM tree resulted in area under the curve (AUC) values of 0.58 (0.56, 0.61), 0.81 (0.77, 0.84), and 0.80 (0.76, 0.83), respectively, among the study population. In addition, the tree model had the best specificity at 81% but the lowest sensitivity at 65% compared to the other models.
Given the superior performance of the GLMM tree compared with the standard tree and the lack of significant difference with the GLMM, using this model is suggested due to its simpler interpretation and fewer assumptions. Using updated statistical models for more accurate CVD prediction can result in more precise frameworks to aid in proactive patient detection planning.
如今,心血管疾病(CVD)是全球最重要的死亡原因。在这项研究中,我们的主要目的是使用该疾病的一些重要指标来预测 CVD,并根据这些指标提出一种基于树的统计框架来检测 CVD 患者。
我们使用了 Fasa 队列研究(FACS)基线阶段的数据。因变量是 CVD 的存在。对数据进行了普通树和广义线性混合模型(GLMM)拟合,并将其对 CVD 的预测能力与 GLMM 树的结果进行了比较。使用 RStudio 软件进行了统计分析。
分析了 9499 名年龄在 35 至 70 岁的参与者的数据。多变量混合效应逻辑回归模型的结果表明,参与者的年龄、总胆固醇、婚姻状况、吸烟状况、血糖、一级和二级亲属的心脏病或心肌梗死(MI)病史以及其他疾病(如高血压、抑郁症、慢性头痛和甲状腺疾病)的存在与 CVD 的发生显著相关(<0.05)。拟合普通树、GLMM 和 GLMM 树后,在研究人群中的 AUC 值分别为 0.58(0.56,0.61)、0.81(0.77,0.84)和 0.80(0.76,0.83)。此外,与其他模型相比,树模型的特异性最佳(81%),但敏感性最低(65%)。
鉴于 GLMM 树与标准树相比表现更优,且与 GLMM 无显著差异,建议使用该模型,因为它的解释更简单,假设更少。使用更新的统计模型进行更准确的 CVD 预测可以构建更精确的框架,有助于主动进行患者检测规划。