IEEE J Biomed Health Inform. 2022 Jul;26(7):3362-3372. doi: 10.1109/JBHI.2022.3148820. Epub 2022 Jul 1.
Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We aimed to develop a deep-learning framework for accurate and yet explainable prediction of 6-month incident heart failure (HF). Using 100,071 patients from longitudinal linked electronic health records across the U.K., we applied a novel Transformer-based risk model using all community and hospital diagnoses and medications contextualized within the age and calendar year for each patient's clinical encounter. Feature importance was investigated with an ablation analysis to compare model performance when alternatively removing features and by comparing the variability of temporal representations. A post-hoc perturbation technique was conducted to propagate the changes in the input to the outcome for feature contribution analyses. Our model achieved 0.93 area under the receiver operator curve and 0.69 area under the precision-recall curve on internal 5-fold cross validation and outperformed existing deep learning models. Ablation analysis indicated medication is important for predicting HF risk, calendar year is more important than chronological age, which was further reinforced by temporal variability analysis. Contribution analyses identified risk factors that are closely related to HF. Many of them were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models. In conclusion, the results highlight that our deep learning model, in addition high predictive performance, can inform data-driven risk factor identification.
预测心力衰竭等复杂慢性病的发病率具有挑战性。应用于丰富的电子健康记录的深度学习模型可以提高预测的准确性,但仍然难以解释,从而阻碍了它们在医学实践中的广泛应用。我们旨在开发一种深度学习框架,以准确预测 6 个月内发生心力衰竭(HF)的风险,并具有可解释性。我们使用来自英国纵向链接的电子健康记录中的 100071 名患者,应用了一种新颖的基于 Transformer 的风险模型,该模型使用了所有社区和医院的诊断和药物,并将其与每位患者临床就诊时的年龄和日历年相关联。通过消融分析研究了特征重要性,以比较在替代特征时模型性能的差异,并通过比较时间表示的可变性来比较模型性能。进行了事后扰动技术,以便将输入中的变化传播到特征贡献分析的结果中。我们的模型在内部 5 折交叉验证中达到了 0.93 的接收器操作曲线下面积和 0.69 的精度-召回曲线下面积,优于现有的深度学习模型。消融分析表明,药物对于预测 HF 风险很重要,日历年比实际年龄更重要,时间可变性分析进一步证实了这一点。贡献分析确定了与 HF 密切相关的风险因素。其中许多与临床和流行病学研究中的现有知识一致,但也揭示了一些新的关联,这些关联在专家驱动的风险预测模型中并未考虑。总之,结果表明,除了具有较高的预测性能外,我们的深度学习模型还可以提供数据驱动的风险因素识别。