Bernstorff Martin, Hansen Lasse, Olesen Kevin Kris Warnakula, Danielsen Andreas Aalkjær, Østergaard Søren Dinesen
Department of Affective Disorders, Aarhus University Hospital - Psychiatry, Aarhus, Denmark.
Department of Clinical Medicine, Aarhus University, Aarhus, Denmark.
Eur Psychiatry. 2025 Jan 8;68(1):e12. doi: 10.1192/j.eurpsy.2025.1.
Cardiovascular disease (CVD) is twice as prevalent among individuals with mental illness compared to the general population. Prevention strategies exist but require accurate risk prediction. This study aimed to develop and validate a machine learning model for predicting incident CVD among patients with mental illness using routine clinical data from electronic health records.
A cohort study was conducted using data from 74,880 patients with 1.6 million psychiatric service contacts in the Central Denmark Region from 2013 to 2021. Two machine learning models (XGBoost and regularised logistic regression) were trained on 85% of the data from six hospitals using 234 potential predictors. The best-performing model was externally validated on the remaining 15% of patients from another three hospitals. CVD was defined as myocardial infarction, stroke, or peripheral arterial disease.
The best-performing model (hyperparameter-tuned XGBoost) demonstrated acceptable discrimination, with an area under the receiver operating characteristic curve of 0.84 on the training set and 0.74 on the validation set. It identified high-risk individuals 2.5 years before CVD events. For the psychiatric service contacts in the top 5% of predicted risk, the positive predictive value was 5%, and the negative predictive value was 99%. The model issued at least one positive prediction for 39% of patients who developed CVD.
A machine learning model can accurately predict CVD risk among patients with mental illness using routinely collected electronic health record data. A decision support system building on this approach may aid primary CVD prevention in this high-risk population.
与普通人群相比,心血管疾病(CVD)在精神疾病患者中的患病率高出一倍。虽然存在预防策略,但需要准确的风险预测。本研究旨在开发并验证一种机器学习模型,该模型使用电子健康记录中的常规临床数据预测精神疾病患者发生CVD的风险。
采用队列研究,使用2013年至2021年丹麦中部地区74,880名患者的160万次精神科服务接触数据。使用234个潜在预测因子,在来自六家医院的85%的数据上训练了两种机器学习模型(XGBoost和正则化逻辑回归)。表现最佳的模型在来自另外三家医院的其余15%的患者中进行了外部验证。CVD定义为心肌梗死、中风或外周动脉疾病。
表现最佳的模型(超参数调整后的XGBoost)显示出可接受的区分度,训练集上的受试者工作特征曲线下面积为0.84,验证集上为0.74。它在CVD事件发生前2.5年识别出高危个体。对于预测风险最高的5%的精神科服务接触患者,阳性预测值为5%,阴性预测值为99%。该模型对39%发生CVD的患者至少发出了一次阳性预测。
机器学习模型可以使用常规收集的电子健康记录数据准确预测精神疾病患者的CVD风险。基于这种方法构建的决策支持系统可能有助于对这一高危人群进行原发性CVD预防。