利用医学概念嵌入中的层次结构。

Exploiting hierarchy in medical concept embedding.

作者信息

Finch Anthony, Crowell Alexander, Bhatia Mamta, Parameshwarappa Pooja, Chang Yung-Chieh, Martinez Jose, Horberg Michael

机构信息

Kaiser Permanente Mid-Atlantic Permanente Medical Group, Rockville, Maryland, USA.

Kaiser Permanente Mid-Atlantic Permanente Research Institute, Rockville, Maryland, USA.

出版信息

JAMIA Open. 2021 Mar 16;4(1):ooab022. doi: 10.1093/jamiaopen/ooab022. eCollection 2021 Jan.

DOI:10.1093/jamiaopen/ooab022

PMID:33748691

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7962787/

Abstract

OBJECTIVE

To construct and publicly release a set of medical concept embeddings for codes following the ICD-10 coding standard which explicitly incorporate hierarchical information from medical codes into the embedding formulation.

MATERIALS AND METHODS

We trained concept embeddings using several new extensions to the Word2Vec algorithm using a dataset of approximately 600,000 patients from a major integrated healthcare organization in the Mid-Atlantic US. Our concept embeddings included additional entities to account for the medical categories assigned to codes by the Clinical Classification Software Revised (CCSR) dataset. We compare these results to sets of publicly released pretrained embeddings and alternative training methodologies.

RESULTS

We found that Word2Vec models which included hierarchical data outperformed ordinary Word2Vec alternatives on tasks which compared naïve clusters to canonical ones provided by CCSR. Our Skip-Gram model with both codes and categories achieved 61.4% normalized mutual information with canonical labels in comparison to 57.5% with traditional Skip-Gram. In models operating on two different outcomes, we found that including hierarchical embedding data improved classification performance 96.2% of the time. When controlling for all other variables, we found that co-training embeddings improved classification performance 66.7% of the time. We found that all models outperformed our competitive benchmarks.

DISCUSSION

We found significant evidence that our proposed algorithms can express the hierarchical structure of medical codes more fully than ordinary Word2Vec models, and that this improvement carries forward into classification tasks. As part of this publication, we have released several sets of pretrained medical concept embeddings using the ICD-10 standard which significantly outperform other well-known pretrained vectors on our tested outcomes.

摘要

目的

构建并公开发布一组遵循ICD - 10编码标准的医学概念嵌入，将医学编码中的层次信息明确纳入嵌入公式。

材料与方法

我们使用来自美国大西洋中部一家大型综合医疗保健机构的约600,000名患者的数据集，通过对Word2Vec算法的几个新扩展来训练概念嵌入。我们的概念嵌入包括额外的实体，以考虑临床分类软件修订版（CCSR）数据集分配给编码的医学类别。我们将这些结果与公开发布的预训练嵌入集和替代训练方法进行比较。

结果

我们发现，在将朴素聚类与CCSR提供的标准聚类进行比较的任务中，包含层次数据的Word2Vec模型优于普通的Word2Vec替代模型。我们的带有编码和类别的Skip - Gram模型与标准标签的归一化互信息达到61.4%，而传统Skip - Gram模型为57.5%。在处理两种不同结果的模型中，我们发现包含层次嵌入数据在96.2%的情况下提高了分类性能。在控制所有其他变量时，我们发现共同训练嵌入在66.7%的情况下提高了分类性能。我们发现所有模型都优于我们的竞争基准。

讨论

我们发现有重要证据表明，我们提出的算法比普通Word2Vec模型能更充分地表达医学编码的层次结构，并且这种改进在分类任务中得以延续。作为本出版物的一部分，我们使用ICD - 10标准发布了几组预训练的医学概念嵌入，在我们测试的结果上显著优于其他知名的预训练向量。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用医学概念嵌入中的层次结构。

Exploiting hierarchy in medical concept embedding.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

目的

材料与方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

利用医学概念嵌入中的层次结构。

Exploiting hierarchy in medical concept embedding.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

目的

材料与方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献