Department of Geriatric Cardiology, National Clinical Research Center for Geriatric Diseases, Second Medical Center of Chinese PLA General Hospital, Beijing, 100853, China.
Medical School of Chinese PLA, Beijing, 100853, China.
BMC Cardiovasc Disord. 2024 Aug 13;24(1):420. doi: 10.1186/s12872-024-04082-9.
Accurate prediction of survival prognosis is helpful to guide clinical decision-making. The aim of this study was to develop a model using machine learning techniques to predict the occurrence of composite thromboembolic events (CTEs) in elderly patients with atrial fibrillation(AF). These events encompass newly diagnosed cerebral ischemia events, cardiovascular events, pulmonary embolism, and lower extremity arterial embolism.
This retrospective study included 6,079 elderly hospitalized patients (≥ 75 years old) with AF admitted to the People's Liberation Army General Hospital in China from January 2010 to June 2022. Random forest imputation was used for handling missing data. In the descriptive statistics section, patients were divided into two groups based on the occurrence of CTEs, and differences between the two groups were analyzed using chi-square tests for categorical variables and rank-sum tests for continuous variables. In the machine learning section, the patients were randomly divided into a training dataset (n = 4,225) and a validation dataset (n = 1,824) in a 7:3 ratio. Four machine learning models (logistic regression, decision tree, random forest, XGBoost) were trained on the training dataset and validated on the validation dataset.
The incidence of composite thromboembolic events was 19.53%. The Least Absolute Shrinkage and Selection Operator (LASSO) method, using 5-fold cross-validation, was applied to the training dataset and identified a total of 18 features that exhibited a significant association with the occurrence of CTEs. The random forest model outperformed other models in terms of area under the curve (ACC: 0.9144, SEN: 0.7725, SPE: 0.9489, AUC: 0.927, 95% CI: 0.9105-0.9443). The random forest model also showed good clinical validity based on the clinical decision curve. The Shapley Additive exPlanations (SHAP) showed that the top five features associated with the model were history of ischemic stroke, high triglyceride (TG), high total cholesterol (TC), high plasma D-dimer, age.
This study proposes an accurate model to stratify patients with a high risk of CTEs. The random forest model has good performance. History of ischemic stroke, age, high TG, high TC and high plasma D-Dimer may be correlated with CTEs.
准确预测生存预后有助于指导临床决策。本研究旨在利用机器学习技术建立一个模型,预测老年心房颤动(AF)患者复合血栓栓塞事件(CTE)的发生。这些事件包括新诊断的脑缺血事件、心血管事件、肺栓塞和下肢动脉栓塞。
本回顾性研究纳入了 2010 年 1 月至 2022 年 6 月在中国人民解放军总医院住院的 6079 名年龄≥75 岁的老年 AF 患者。采用随机森林插补法处理缺失数据。在描述性统计部分,根据 CTE 的发生情况将患者分为两组,采用卡方检验比较分类变量,秩和检验比较连续变量。在机器学习部分,患者按 7:3 的比例随机分为训练数据集(n=4225)和验证数据集(n=1824)。在训练数据集上训练 4 种机器学习模型(逻辑回归、决策树、随机森林、XGBoost),并在验证数据集上进行验证。
复合血栓栓塞事件的发生率为 19.53%。使用 5 折交叉验证的最小绝对收缩和选择算子(LASSO)方法对训练数据集进行分析,共筛选出与 CTE 发生显著相关的 18 个特征。随机森林模型在曲线下面积(ACC:0.9144、SEN:0.7725、SPE:0.9489、AUC:0.927、95%CI:0.9105-0.9443)方面优于其他模型。基于临床决策曲线,随机森林模型也显示出良好的临床有效性。Shapley 加性解释(SHAP)表明,与模型相关的前 5 个特征是缺血性脑卒中史、高甘油三酯(TG)、高总胆固醇(TC)、高血浆 D-二聚体、年龄。
本研究提出了一种预测 CTE 高危患者的准确模型。随机森林模型性能良好。缺血性脑卒中史、年龄、高 TG、高 TC 和高血浆 D-二聚体可能与 CTE 相关。