Bai Tian, Chanda Ashis Kumar, Egleston Brian L, Vucetic Slobodan
Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA.
Fox Chase Cancer Center, Philadelphia, PA 19111, USA.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:764-769. doi: 10.1109/BIBM.2017.8217752. Epub 2017 Dec 18.
There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods.
从电子健康记录(EHR)中学习医学概念的低维向量表示越来越受到关注。虽然EHR包含诊断代码和实验室检查等结构化数据,但它们也包含非结构化的临床记录,这些记录提供了关于患者健康状况更细微的细节。在这项工作中,我们提出了一种联合学习医学概念和单词表示的方法。具体来说,我们通过对word2vec模型使用一种新颖的学习方案来专注于捕捉医学代码和单词之间的关系。我们的方法利用同一次就诊中EHR不同部分之间的关系,并将代码和单词嵌入到同一个连续向量空间中。最后,我们能够得出反映不同疾病和治疗模式的聚类。在我们的实验中,我们定性地展示了我们为给定诊断代码分组单词的方法与主题建模方法相比如何。我们还测试了我们的表示在预测下次就诊的疾病模式方面的效果如何。结果表明,我们的方法优于几种常用方法。