Darabi Sajad, Kachuee Mohammad, Fazeli Shayan, Sarrafzadeh Majid
IEEE J Biomed Health Inform. 2020 Nov;24(11):3268-3275. doi: 10.1109/JBHI.2020.2984931. Epub 2020 Nov 4.
Effective representation learning of electronic health records is a challenging task and is becoming more important as the availability of such data is becoming pervasive. The data contained in these records are irregular and contain multiple modalities such as notes, and medical codes. They are preempted by medical conditions the patient may have, and are typically recorded by medical staff. Accompanying codes are notes containing valuable information about patients beyond the structured information contained in electronic health records. We use transformer networks and the recently proposed BERT language model to embed these data streams into a unified vector representation. The presented approach effectively encodes a patient's visit data into a single a distributed representation, which can be used for downstream tasks. Our model demonstrates superior performance and generalization on mortality, readmission and length of stay tasks using the publicly available MIMIC-III ICU dataset.
电子健康记录的有效表示学习是一项具有挑战性的任务,并且随着此类数据的普及,其变得越来越重要。这些记录中包含的数据是不规则的,并且包含多种形式,如笔记和医学编码。它们受到患者可能患有的医疗状况的影响,并且通常由医务人员记录。伴随的编码是包含有关患者的有价值信息的笔记,这些信息超出了电子健康记录中包含的结构化信息。我们使用Transformer网络和最近提出的BERT语言模型将这些数据流嵌入到统一的向量表示中。所提出的方法有效地将患者的就诊数据编码为单个分布式表示,可用于下游任务。我们的模型在使用公开可用的MIMIC-III ICU数据集的死亡率、再入院率和住院时间任务上表现出卓越的性能和泛化能力。