Zhao Dan, Shi Yuliang, Cheng Lin, Li Hui, Zhang Liguo, Guo Hongmei
School of Software, Shandong University, China.
School of Software, Shandong University, China; Dareway Software Co., Ltd, China.
J Biomed Inform. 2023 Mar;139:104239. doi: 10.1016/j.jbi.2022.104239. Epub 2022 Nov 7.
Deep learning methods have achieved success in disease prediction using electronic health records (EHR) data. Most of the existing methods have some limitations. First, most of the methods adopt a homogeneous decay way to deal with the effect of time interval on patient's previous visits information. However, the effect of the time interval between patient's visits is not always negative. For example, although the time interval between visits for patients with chronic diseases is relatively long, the importance of the previous visit to the next visit is high, and we may not be able to consider the effect of the time interval as negative at this point. That is, the effect of the time interval on previous visits is exerted in a nonmonotonic manner, and it is either positive, negative, or neutral. In addition, the effect of text information on prediction results is not taken into account in most of methods. The text in EHR contains a description of the patient's past medical history and current symptoms of the disease, which is important for prediction results. In order to solve these issues, we propose a Time Interval Uncertainty-Aware and Text-Enhanced Based Disease Prediction Model, which utilizes the uncertain effects of time intervals and patient's text information for disease prediction. Firstly, we apply a cross-attention mechanism to generate a global representation of the patient using the patient's disease and text information from the EHR. Then, we use the key-query attention mechanism to obtain the two importance weights of the two visit sequences with and without time intervals, respectively. Furthermore, we achieve disease prediction by making slight adjustments to the encode part of the Transformer, a deep learning model based on a self-attention mechanism. We compare with various state-of-the-art models on two publicly available datasets, MIMIC-III and MIMIC-IV, and select the top 10 diseases with the highest frequency in the dataset as the target diseases. On the MIMIC-III dataset, our model is up to three percent higher than the optimal baseline in terms of evaluation metrics.
深度学习方法在使用电子健康记录(EHR)数据进行疾病预测方面取得了成功。现有的大多数方法都存在一些局限性。首先,大多数方法采用均匀衰减的方式来处理时间间隔对患者既往就诊信息的影响。然而,患者就诊之间的时间间隔的影响并不总是负面的。例如,尽管慢性病患者的就诊时间间隔相对较长,但前一次就诊对下一次就诊的重要性很高,此时我们可能无法将时间间隔的影响视为负面的。也就是说,时间间隔对既往就诊的影响是以非单调的方式发挥作用的,它可能是正面的、负面的或中性的。此外,大多数方法没有考虑文本信息对预测结果的影响。EHR中的文本包含患者过去的病史和当前疾病症状的描述,这对预测结果很重要。为了解决这些问题,我们提出了一种基于时间间隔不确定性感知和文本增强的疾病预测模型,该模型利用时间间隔的不确定影响和患者的文本信息进行疾病预测。首先,我们应用交叉注意力机制,利用患者的疾病信息和EHR中的文本信息生成患者的全局表示。然后,我们使用键值查询注意力机制分别获得有时间间隔和无时间间隔的两个就诊序列的两个重要性权重。此外,我们通过对基于自注意力机制的深度学习模型Transformer的编码部分进行微调来实现疾病预测。我们在两个公开可用的数据集MIMIC-III和MIMIC-IV上与各种最先进的模型进行了比较,并选择数据集中出现频率最高的前10种疾病作为目标疾病。在MIMIC-III数据集上,我们的模型在评估指标方面比最优基线高出多达3%。