Internal Medicine Department, Mostoles University Hospital, Calle Rio Jucar, s/n, 28935, Mostoles, Madrid, Spain.
Rey Juan Carlos University, Móstoles, Spain.
Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.
Appropriate management of hypertensive patients relies on the accurate identification of clinically relevant features. However, traditional statistical methods may ignore important information in datasets or overlook possible interactions among features. Machine learning may improve the prediction accuracy and interpretability of regression models by identifying the most relevant features in hypertensive patients. We sought the most relevant features for prediction of cardiovascular (CV) events in a hypertensive population. We used the penalized regression models least absolute shrinkage and selection operator (LASSO) and elastic net (EN) to obtain the most parsimonious and accurate models. The clinical parameters and laboratory biomarkers were collected from the clinical records of 1,471 patients receiving care at Mostoles University Hospital. The outcome was the development of major adverse CV events. Cox proportional hazards regression was performed alone and with penalized regression analyses (LASSO and EN), producing three models. The modeling was performed using 10-fold cross-validation to fit the penalized models. The three predictive models were compared and statistically analyzed to assess their classification accuracy, sensitivity, specificity, discriminative power, and calibration accuracy. The standard Cox model identified five relevant features, while LASSO and EN identified only three (age, LDL cholesterol, and kidney function). The accuracies of the models (prediction vs. observation) were 0.767 (Cox model), 0.754 (LASSO), and 0.764 (EN), and the areas under the curve were 0.694, 0.670, and 0.673, respectively. However, pairwise comparison of performance yielded no statistically significant differences. All three calibration curves showed close agreement between the predicted and observed probabilities of the development of a CV event. Although the performance was similar for all three models, both penalized regression analyses produced models with good fit and fewer features than the Cox regression predictive model but with the same accuracy. This case study of predictive models using penalized regression analyses shows that penalized regularization techniques can provide predictive models for CV risk assessment that are parsimonious, highly interpretable, and generalizable and that have good fit. For clinicians, a parsimonious model can be useful where available data are limited, as such a model can offer a simple but efficient way to model the impact of the different features on the prediction of CV events. Management of these features may lower the risk for a CV event. Graphical Abstract In a clinical setting, with numerous biological and laboratory features and incomplete datasets, traditional statistical methods may ignore important information and overlook possible interactions among features. Our aim was to identify the most relevant features to predict cardiovascular events in a hypertensive population, using three different regression approaches for feature selection, to improve the prediction accuracy and interpretability of regression models by identifying the relevant features in these patients.
高血压患者的恰当管理依赖于对临床相关特征的准确识别。然而,传统的统计方法可能会忽略数据集中的重要信息,或者忽略特征之间可能存在的相互作用。通过识别高血压患者中最相关的特征,机器学习可以提高回归模型的预测准确性和可解释性。
我们旨在确定与预测高血压人群心血管事件最相关的特征,使用三种不同的回归方法进行特征选择,通过识别这些患者中的相关特征来提高回归模型的预测准确性和可解释性。
我们在接受莫斯托莱斯大学医院治疗的 1471 名患者的临床记录中收集了临床参数和实验室生物标志物。结果是主要不良心血管事件的发生。我们单独使用 Cox 比例风险回归和惩罚回归分析(LASSO 和 EN)进行分析,生成了三个模型。通过 10 折交叉验证来拟合惩罚模型进行建模。比较并统计分析了三个预测模型,以评估它们的分类准确性、敏感性、特异性、判别能力和校准准确性。标准 Cox 模型确定了五个相关特征,而 LASSO 和 EN 仅确定了三个(年龄、LDL 胆固醇和肾功能)。模型的准确性(预测与观察)分别为 0.767(Cox 模型)、0.754(LASSO)和 0.764(EN),曲线下面积分别为 0.694、0.670 和 0.673。然而,性能的两两比较没有统计学上的显著差异。所有三个校准曲线都显示了预测和观察到的心血管事件发生概率之间的密切一致性。
虽然所有三种模型的性能都相似,但两种惩罚回归分析都产生了拟合良好且特征较少的模型,与 Cox 回归预测模型一样准确。本研究使用惩罚回归分析的预测模型表明,惩罚正则化技术可以为心血管风险评估提供预测模型,这些模型具有简约性、高度可解释性和通用性,并且拟合良好。对于临床医生来说,在可用数据有限的情况下,简约模型可能很有用,因为这种模型可以提供一种简单但有效的方法来模拟不同特征对心血管事件预测的影响。管理这些特征可能会降低心血管事件的风险。