Department of Cardiology, Heart Center Leipzig at University Leipzig, Leipzig, Germany.
Department of Pharmacological and Biomolecular Sciences, University of Milan, and I.R.C.C.S MultiMedica, Milan, Italy.
J Am Coll Cardiol. 2021 Oct 19;78(16):1621-1631. doi: 10.1016/j.jacc.2021.08.018.
Individualized risk prediction represents a prerequisite for providing personalized medicine.
This study compared proteomics-enabled machine-learning (ML) algorithms with classical and clinical risk prediction methods for all-cause mortality in cohorts of patients with cardiovascular risk factors in the LIFE-Heart Study, followed by validation in the PLIC (Progressione della Lesione Intimale Carotidea) study.
Using the OLINK-Cardiovascular-II panel, 92 proteins were measured in a cohort of 1,998 individuals from the LIFE-Heart Study (derivation) and 772 subjects from the PLIC cohort (external validation). We constructed protein-based mortality prediction models using eXtreme Gradient Boosting (XGBoost) and a neural network, comparing the prediction performance with classical clinical risk scores (Systemic Coronary Risk Evaluation, Framingham), logistic and Cox regression models.
All-cause mortality occurred in 156 (8%) patients in the internal validation and 68 (9%) patients in the external validation cohort, within a median follow-up of 10 and 11 years, respectively. On internal and external validation, the Framingham Risk Score achieved areas under the curve (AUCs) of 0.64 (95% CI: 0.59-0.68) and 0.65 (95% CI: 0.58-0.74), logistic regression AUCs of 0.65 (95% CI: 0.57-0.73) and 0.67 (95% CI: 0.59-0.74), Cox regression AUCs of 0.55 (95% CI: 0.51-0.59) and 0.65 (95% CI: 0.57-0.73), the XGBoost classifier AUCs of 0.83 (95% CI: 0.79-0.87) and 0.91 (95% CI: 0.86-0.95), the XGBoost survival estimator AUCs of 0.83 (95% CI: 0.79-0.87) and 0.93 (95% CI: 0.88-0.97), and the neural network AUCs of 0.87 (95% CI: 0.83-0.91) and 0.94 (95% CI: 0.90-0.98), respectively (modern vs classical ML: P < 0.001).
ML-driven multiprotein risk models outperform classical regression models and clinical scores for prediction of all-cause mortality in patients at increased cardiovascular risk.
个体化风险预测是提供个性化医疗的前提。
本研究比较了基于蛋白质组学的机器学习(ML)算法与经典和临床风险预测方法,以预测心血管危险因素患者队列的全因死亡率,并在 LIFE-Heart 研究和 PLIC(颈动脉内膜进展)研究中进行验证。
使用 OLINK-Cardiovascular-II 面板,在 LIFE-Heart 研究(内部验证)的 1998 名患者和 PLIC 队列(外部验证)的 772 名患者中测量了 92 种蛋白质。我们使用极端梯度提升(XGBoost)和神经网络构建基于蛋白质的死亡率预测模型,比较了与经典临床风险评分(系统冠状动脉风险评估、弗雷明汉)、逻辑回归和 Cox 回归模型的预测性能。
在中位随访 10 年和 11 年的内部和外部验证队列中,分别有 156 名(8%)和 68 名(9%)患者发生全因死亡。Framingham 风险评分在内部和外部验证中的曲线下面积(AUC)分别为 0.64(95%CI:0.59-0.68)和 0.65(95%CI:0.58-0.74),逻辑回归 AUC 分别为 0.65(95%CI:0.57-0.73)和 0.67(95%CI:0.59-0.74),Cox 回归 AUC 分别为 0.55(95%CI:0.51-0.59)和 0.65(95%CI:0.57-0.73),XGBoost 分类器 AUC 分别为 0.83(95%CI:0.79-0.87)和 0.91(95%CI:0.86-0.95),XGBoost 生存估计器 AUC 分别为 0.83(95%CI:0.79-0.87)和 0.93(95%CI:0.88-0.97),神经网络 AUC 分别为 0.87(95%CI:0.83-0.91)和 0.94(95%CI:0.90-0.98)(现代与经典 ML:P<0.001)。
基于 ML 的多蛋白风险模型在预测心血管风险增加患者的全因死亡率方面优于经典回归模型和临床评分。