aInstitute for Community Medicine, Ernst Moritz Arndt University, Greifswald, Germany bSiemens Healthcare, Malvern, Pennsylvania, USA cClinic of Internal Medicine B, Ernst Moritz Arndt University, Greifswald dInstitute of Epidemiology, Christian Albrechts University, Kiel eInterfaculty Institute of Functional Genomics, Ernst Moritz Arndt University, Greifswald, Germany fResearch Centre for Prevention and Health, Glostrup University Hospital, Glostrup gFaculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark hInstitute of Physiology, University Medicine, Ernst Moritz Arndt University, Greifswald iUniversity Medical Center, Göttingen, Germany *Henry Völzke and Glenn Fung contributed equally to the writing of this article.
J Hypertens. 2013 Nov;31(11):2142-50; discussion 2150. doi: 10.1097/HJH.0b013e328364a16d.
Data mining represents an alternative approach to identify new predictors of multifactorial diseases. This work aimed at building an accurate predictive model for incident hypertension using data mining procedures.
The primary study population consisted of 1605 normotensive individuals aged 20-79 years with 5-year follow-up from the population-based study, that is the Study of Health in Pomerania (SHIP). The initial set was randomly split into a training and a testing set. We used a probabilistic graphical model applying a Bayesian network to create a predictive model for incident hypertension and compared the predictive performance with the established Framingham risk score for hypertension. Finally, the model was validated in 2887 participants from INTER99, a Danish community-based intervention study.
In the training set of SHIP data, the Bayesian network used a small subset of relevant baseline features including age, mean arterial pressure, rs16998073, serum glucose and urinary albumin concentrations. Furthermore, we detected relevant interactions between age and serum glucose as well as between rs16998073 and urinary albumin concentrations [area under the receiver operating characteristic (AUC 0.76)]. The model was confirmed in the SHIP validation set (AUC 0.78) and externally replicated in INTER99 (AUC 0.77). Compared to the established Framingham risk score for hypertension, the predictive performance of the new model was similar in the SHIP validation set and moderately better in INTER99.
Data mining procedures identified a predictive model for incident hypertension, which included innovative and easy-to-measure variables. The findings promise great applicability in screening settings and clinical practice.
数据挖掘代表了一种识别多因素疾病新预测因子的替代方法。本研究旨在使用数据挖掘程序构建一个用于预测高血压的准确预测模型。
主要研究人群为来自基于人群的研究,即波罗的海健康研究(SHIP)的 1605 名年龄在 20-79 岁的血压正常个体,随访时间为 5 年。初始数据集被随机分为训练集和测试集。我们使用概率图形模型,应用贝叶斯网络为高血压事件创建预测模型,并将预测性能与已建立的高血压Framingham 风险评分进行比较。最后,在丹麦社区干预研究 INTER99 的 2887 名参与者中验证了该模型。
在 SHIP 数据的训练集中,贝叶斯网络使用了年龄、平均动脉压、rs16998073、血清葡萄糖和尿白蛋白浓度等相关基线特征的一个小子集。此外,我们还检测到年龄和血清葡萄糖之间以及 rs16998073 和尿白蛋白浓度之间的相关交互作用[受试者工作特征曲线下面积(AUC 0.76)]。该模型在 SHIP 验证集中得到验证(AUC 0.78),并在 INTER99 中得到外部验证(AUC 0.77)。与已建立的高血压Framingham 风险评分相比,该新模型在 SHIP 验证集中的预测性能相似,在 INTER99 中的预测性能略好。
数据挖掘程序确定了一种用于预测高血压事件的预测模型,该模型包括创新且易于测量的变量。这些发现有望在筛查环境和临床实践中得到广泛应用。