Fondazione Bruno Kessler Research Institute, Trento, Italy; University of Trento, Italy.
Critical Care Services, Mayo Clinic, Jacksonville, FL, USA.
Artif Intell Med. 2023 Oct;144:102659. doi: 10.1016/j.artmed.2023.102659. Epub 2023 Sep 14.
Deep Learning (DL) models have received increasing attention in the clinical setting, particularly in intensive care units (ICU). In this context, the interpretability of the outcomes estimated by the DL models is an essential step towards increasing adoption of DL models in clinical practice. To address this challenge, we propose an ante-hoc, interpretable neural network model. Our proposed model, named double self-attention architecture (DSA), uses two attention-based mechanisms, including self-attention and effective attention. It can capture the importance of input variables in general, as well as changes in importance along the time dimension for the outcome of interest. We evaluated our model using two real-world clinical datasets covering 22840 patients in predicting onset of delirium 12 h and 48 h in advance. Additionally, we compare the descriptive performance of our model with three post-hoc interpretable algorithms as well as with the opinion of clinicians based on the published literature and clinical experience. We find that our model covers the majority of the top-10 variables ranked by the other three post-hoc interpretable algorithms as well as the clinical opinion, with the advantage of taking into account both, the dependencies among variables as well as dependencies between varying time-steps. Finally, our results show that our model can improve descriptive performance without sacrificing predictive performance.
深度学习(DL)模型在临床环境中受到越来越多的关注,特别是在重症监护病房(ICU)。在这种情况下,DL 模型估计结果的可解释性是提高 DL 模型在临床实践中应用的重要步骤。为了解决这个挑战,我们提出了一种特殊的、可解释的神经网络模型。我们提出的模型名为双自注意架构(DSA),使用了两种基于注意力的机制,包括自注意力和有效注意力。它可以捕捉输入变量的重要性,以及感兴趣结果的时间维度上重要性的变化。我们使用两个真实的临床数据集评估了我们的模型,这些数据集涵盖了 22840 名患者,用于提前预测谵妄发生的 12 小时和 48 小时。此外,我们还将我们的模型与三个后处理可解释算法以及基于文献和临床经验的临床医生意见的描述性能进行了比较。我们发现,我们的模型涵盖了其他三个后处理可解释算法以及临床医生意见排名前 10 位的变量中的大多数,其优势在于同时考虑了变量之间的依赖性以及不同时间步之间的依赖性。最后,我们的结果表明,我们的模型可以在不牺牲预测性能的情况下提高描述性性能。