利用网络分析和堆叠集成学习预测缺血性心脏病患者心力衰竭的风险。

Risk prediction of heart failure in patients with ischemic heart disease using network analytics and stacking ensemble learning.

机构信息

School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, 611731, P.R. China.

Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

BMC Med Inform Decis Mak. 2023 May 23;23(1):99. doi: 10.1186/s12911-023-02196-2.

DOI:10.1186/s12911-023-02196-2

PMID:37221512

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10207812/

Abstract

BACKGROUND

Heart failure (HF) is a major complication following ischemic heart disease (IHD) and it adversely affects the outcome. Early prediction of HF risk in patients with IHD is beneficial for timely intervention and for reducing disease burden.

METHODS

Two cohorts, cases for patients first diagnosed with IHD and then with HF (N = 11,862) and control IHD patients without HF (N = 25,652), were established from the hospital discharge records in Sichuan, China during 2015-2019. Directed personal disease network (PDN) was constructed for each patient, and then these PDNs were merged to generate the baseline disease network (BDN) for the two cohorts, respectively, which identifies the health trajectories of patients and the complex progression patterns. The differences between the BDNs of the two cohort was represented as disease-specific network (DSN). Three novel network features were exacted from PDN and DSN to represent the similarity of disease patterns and specificity trends from IHD to HF. A stacking-based ensemble model DXLR was proposed to predict HF risk in IHD patients using the novel network features and basic demographic features (i.e., age and sex). The Shapley Addictive exPlanations method was applied to analyze the feature importance of the DXLR model.

RESULTS

Compared with the six traditional machine learning models, our DXLR model exhibited the highest AUC (0.934 ± 0.004), accuracy (0.857 ± 0.007), precision (0.723 ± 0.014), recall (0.892 ± 0.012) and F score (0.798 ± 0.010). The feature importance showed that the novel network features ranked as the top three features, playing a notable role in predicting HF risk of IHD patient. The feature comparison experiment also indicated that our novel network features were superior to those proposed by the state-of-the-art study in improving the performance of the prediction model, with an increase in AUC by 19.9%, in accuracy by 18.7%, in precision by 30.7%, in recall by 37.4%, and in F score by 33.7%.

CONCLUSIONS

Our proposed approach that combines network analytics and ensemble learning effectively predicts HF risk in patients with IHD. This highlights the potential value of network-based machine learning in disease risk prediction field using administrative data.

摘要

背景

心力衰竭（HF）是缺血性心脏病（IHD）后的主要并发症，对预后有不利影响。早期预测 IHD 患者 HF 风险有利于及时干预和降低疾病负担。

方法

从中国四川 2015-2019 年的医院出院记录中建立了两个队列，分别为首次诊断为 IHD 然后诊断为 HF 的患者队列（病例，N=11862）和无 HF 的对照 IHD 患者队列（对照，N=25652）。为每个患者构建有向个人疾病网络（PDN），然后合并这些 PDN 以分别为两个队列生成基线疾病网络（BDN），该网络确定患者的健康轨迹和复杂的进展模式。两个队列的 BDN 之间的差异表示为特定疾病的网络（DSN）。从 PDN 和 DSN 中提取三个新的网络特征来表示从 IHD 到 HF 的疾病模式相似性和特异性趋势。提出了一种基于堆叠的集成模型 DXLR，使用新的网络特征和基本人口统计学特征（即年龄和性别）来预测 IHD 患者的 HF 风险。应用 Shapley Addictive exPlanations 方法分析 DXLR 模型的特征重要性。

结果

与六种传统机器学习模型相比，我们的 DXLR 模型表现出最高的 AUC（0.934±0.004）、准确性（0.857±0.007）、精度（0.723±0.014）、召回率（0.892±0.012）和 F 分数（0.798±0.010）。特征重要性表明，新的网络特征排名前三位，在预测 IHD 患者 HF 风险方面发挥了重要作用。特征比较实验也表明，我们的新网络特征在提高预测模型性能方面优于最先进研究中提出的特征，AUC 提高了 19.9%，准确性提高了 18.7%，精度提高了 30.7%，召回率提高了 37.4%，F 分数提高了 33.7%。