Lv Huasheng, Bi Xuehua, Shang Shuai, Wei Meng, Zhou Xianhui, Wang Kai, Tang Baopeng, Lu Yanmei
Department of Pacing and Electrophysiology, Department of Cardiac Electrophysiology and Remodeling, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, China.
Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830017, China.
Sci Rep. 2025 Aug 12;15(1):29554. doi: 10.1038/s41598-025-14579-8.
This study developed and validated a machine learning (ML) model to predict in-hospital cardiac mortality in 18,727 atrial fibrillation (AF) patients using electronic medical record data. Four ML algorithms-random forest, extreme gradient boosting (XGBoost), deep neural network, and logistic regression-were applied to 79 clinical variables, including demographics, vital signs, comorbidities, lifestyle factors, and laboratory parameters. The XGBoost model achieved the best performance, with an area under the curve of 0.964 ± 0.014 in the training set and 0.932 ± 0.057 in the validation set, alongside precision, accuracy, and recall of 0.909 ± 0.021, 0.910 ± 0.021, and 0.897 ± 0.038, respectively. Shapley Additive Explanations identified key predictors such as thyroid function indices (e.g., total triiodothyronine, total thyroxine), procalcitonin, N-terminal pro-brain natriuretic peptide, and international normalized ratio. This interpretable model holds promise for improving early risk stratification and individualized care in AF patients. Prospective, multi-center validation is needed to confirm its generalizability.
本研究开发并验证了一种机器学习(ML)模型,该模型使用电子病历数据预测18727例心房颤动(AF)患者的院内心脏死亡率。将四种ML算法——随机森林、极端梯度提升(XGBoost)、深度神经网络和逻辑回归——应用于79个临床变量,包括人口统计学、生命体征、合并症、生活方式因素和实验室参数。XGBoost模型表现最佳,训练集曲线下面积为0.964±0.014,验证集曲线下面积为0.932±0.057,精确率、准确率和召回率分别为0.909±0.021、0.910±0.021和0.897±0.038。夏普利值加法解释法确定了关键预测因素,如甲状腺功能指标(如总三碘甲状腺原氨酸、总甲状腺素)、降钙素原、N末端脑钠肽前体和国际标准化比值。这种可解释的模型有望改善AF患者的早期风险分层和个性化护理。需要进行前瞻性、多中心验证以确认其可推广性。