Suppr超能文献

基于深度学习的OMOP通用数据模型中的自动术语映射

Deep-learning-based automated terminology mapping in OMOP-CDM.

作者信息

Kang Byungkon, Yoon Jisang, Kim Ha Young, Jo Sung Jin, Lee Yourim, Kam Hye Jin

机构信息

Department of Computer Science, State University of New York, Incheon, South Korea.

Graduate School of Information, Yonsei University, Seoul, South Korea.

出版信息

J Am Med Inform Assoc. 2021 Jul 14;28(7):1489-1496. doi: 10.1093/jamia/ocab030.

Abstract

OBJECTIVE

Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping.

MATERIALS AND METHODS

To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings.

RESULTS

The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements.

DISCUSSION

The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system.

CONCLUSION

Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.

摘要

目的

由于机构间词汇的多样性,从多个机构获取医学数据存在困难。诸如通用数据模型之类的标准化方案已被提出作为解决此问题的方法,但此类方案需要昂贵的人工监督。本研究旨在构建一个可训练的系统,该系统能够自动执行语义机构间代码映射的过程。

材料与方法

为了自动进行源代码和目标代码之间的映射,我们计算相应描述性句子之间基于嵌入的语义相似度。我们还实施了一种系统的方法来准备用于相似度计算的训练数据。将实验结果与传统的基于单词的映射进行比较。

结果

将所提出的模型与观察性医疗结果合作组织通用数据模型中最先进的自动匹配系统(称为“玉兔”)进行比较。通过为每个正样本合并多个负训练样本,我们的语义匹配方法显著优于“玉兔”。其匹配准确率比“玉兔”至少高10%,并且在各种前k测量中这一趋势都是一致的。

讨论

所提出的基于深度学习的映射方法优于先前简单的单词级匹配算法,因为它可以考虑上下文和语义信息。此外,我们证明了选择负训练样本的方式会显著影响系统的整体性能。

结论

与传统的基于文本共现的方法相比,纳入代码描述的语义能更显著地提高匹配准确率。负训练样本收集方法也是所提出的可训练系统的一个重要组成部分,可在当前和未来的相关系统中采用。

相似文献

10
Similarity matching of medical question based on Siamese network.基于孪生网络的医学问题相似度匹配。
BMC Med Inform Decis Mak. 2023 Apr 6;23(1):55. doi: 10.1186/s12911-023-02161-z.

引用本文的文献

本文引用的文献

8
A review of medical terminology standards and structured reporting.医学术语标准与结构化报告综述。
J Vet Diagn Invest. 2018 Jan;30(1):17-25. doi: 10.1177/1040638717738276. Epub 2017 Oct 15.
9
Distributed Data Networks That Support Public Health Information Needs.支持公共卫生信息需求的分布式数据网络。
J Public Health Manag Pract. 2017 Nov/Dec;23(6):674-683. doi: 10.1097/PHH.0000000000000614.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验