Kwak Heeyoung, Chang Jooyoung, Choe Byeongjin, Park Sangmin, Jung Kyomin
Department of Electrical Engineering, Seoul National University, Seoul, Republic of Korea.
Department of Biomedical Sciences, Seoul National University, Seoul, Republic of Korea.
J Am Med Inform Assoc. 2021 Sep 18;28(10):2155-2164. doi: 10.1093/jamia/ocab109.
We propose an interpretable disease prediction model that efficiently fuses multiple types of patient records using a self-attentive fusion encoder. We assessed the model performance in predicting cardiovascular disease events, given the records of a general patient population.
We extracted 798111 ses and 67 623 controls from the sample cohort database and nationwide healthcare claims data of South Korea. Among the information provided, our model used the sequential records of medical codes and patient characteristics, such as demographic profiles and the most recent health examination results. These two types of patient records were combined in our self-attentive fusion module, whereas previously dominant methods aggregated them using a simple concatenation. The prediction performance was compared to state-of-the-art recurrent neural network-based approaches and other widely used machine learning approaches.
Our model outperformed all the other compared methods in predicting cardiovascular disease events. It achieved an area under the curve of 0.839, while the other compared methods achieved between 0.74111 d 0.830. Moreover, our model consistently outperformed the other methods in a more challenging setting in which we tested the model's ability to draw an inference from more nonobvious, diverse factors.
We also interpreted the attention weights provided by our model as the relative importance of each time step in the sequence. We showed that our model reveals the informative parts of the patients' history by measuring the attention weights.
We suggest an interpretable disease prediction model that efficiently fuses heterogeneous patient records and demonstrates superior disease prediction performance.
我们提出一种可解释的疾病预测模型,该模型使用自注意力融合编码器有效地融合多种类型的患者记录。我们在给定普通患者群体记录的情况下,评估了该模型在预测心血管疾病事件方面的性能。
我们从韩国的样本队列数据库和全国医疗保健理赔数据中提取了798111例病例和67623例对照。在提供的信息中,我们的模型使用了医疗编码的顺序记录以及患者特征,如人口统计学资料和最新的健康检查结果。这两种类型的患者记录在我们的自注意力融合模块中进行了合并,而以前占主导地位的方法是使用简单拼接来聚合它们。将预测性能与基于循环神经网络的最新方法以及其他广泛使用的机器学习方法进行了比较。
在预测心血管疾病事件方面,我们的模型优于所有其他比较方法。它的曲线下面积达到了0.839,而其他比较方法的曲线下面积在0.741至0.830之间。此外,在一个更具挑战性的场景中,即测试模型从更不明显、更多样化的因素中进行推理的能力时,我们的模型始终优于其他方法。
我们还将模型提供的注意力权重解释为序列中每个时间步的相对重要性。我们表明,我们的模型通过测量注意力权重揭示了患者病史中的信息部分。
我们提出了一种可解释的疾病预测模型,该模型有效地融合了异构患者记录,并展示了卓越的疾病预测性能。