基于预测任务的电子健康记录中医疗编码的表示学习。

Prediction task guided representation learning of medical codes in EHR.

机构信息

Department of Industrial Engineering, Tsinghua University, Beijing, China.

Department of Industrial Engineering, Tsinghua University, Beijing, China.

出版信息

J Biomed Inform. 2018 Aug;84:1-10. doi: 10.1016/j.jbi.2018.06.013. Epub 2018 Jun 19.

Abstract

There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples.

摘要

机器学习模型在电子病历(EHR)中的预测分析中的应用迅速增长,以提高医院服务质量和医疗资源利用效率。开发此类模型的一个基本而关键的步骤是将 EHR 中的医疗代码转换为特征向量。这些医疗代码用于表示诊断或程序。它们的向量表示对机器学习模型的性能有很大的影响。最近,一些研究人员利用自然语言处理(NLP)中的表示学习方法来学习医疗代码的向量表示。然而,大多数以前的方法都是无监督的,即医疗代码向量的生成与预测任务无关。因此,获得的特征向量可能不适合特定的预测任务。此外,无监督方法通常需要大量样本才能获得可靠的结果,但大多数实际问题的患者样本非常有限。在本文中,我们开发了一种称为“预测任务引导健康记录聚合(PTGHRA)”的新方法,该方法根据预测任务聚合健康记录,为各种表示学习模型构建训练语料库。与无监督方法相比,与 PTGHRA 集成的表示学习模型在生成的医疗代码向量的预测能力方面有了显著提高,特别是对于有限的训练样本。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索