使用学习到的编辑模式和子概念匹配进行临床术语标准化：系统开发与评估

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.

作者信息

Kate Rohit J

机构信息

Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, United States.

出版信息

JMIR Med Inform. 2021 Jan 14;9(1):e23104. doi: 10.2196/23104.

DOI:10.2196/23104

PMID:33443483

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7843202/

Abstract

BACKGROUND

Clinical terms mentioned in clinical text are often not in their standardized forms as listed in clinical terminologies because of linguistic and stylistic variations. However, many automated downstream applications require clinical terms mapped to their corresponding concepts in clinical terminologies, thus necessitating the task of clinical term normalization.

OBJECTIVE

In this paper, a system for clinical term normalization is presented that utilizes edit patterns to convert clinical terms into their normalized forms.

METHODS

The edit patterns are automatically learned from the Unified Medical Language System (UMLS) Metathesaurus as well as from the given training data. The edit patterns are generalized sequences of edits that are derived from edit distance computations. The edit patterns are both character based as well as word based and are learned separately for different semantic types. In addition to these edit patterns, the system also normalizes clinical terms through the subconcepts mentioned within them.

RESULTS

The system was evaluated as part of the 2019 n2c2 Track 3 shared task of clinical term normalization. It obtained 80.79% accuracy on the standard test data. This paper includes ablation studies to evaluate the contributions of different components of the system. A challenging part of the task was disambiguation when a clinical term could be normalized to multiple concepts.

CONCLUSIONS

The learned edit patterns led the system to perform well on the normalization task. Given that the system is based on patterns, it is human interpretable and is also capable of giving insights about common variations of clinical terms mentioned in clinical text that are different from their standardized forms.

摘要

背景

由于语言和文体的变化，临床文本中提及的临床术语往往并非临床术语表中列出的标准化形式。然而，许多自动化的下游应用需要将临床术语映射到临床术语表中的相应概念，因此需要进行临床术语规范化任务。

目的

本文提出一种临床术语规范化系统，该系统利用编辑模式将临床术语转换为其规范化形式。

方法

编辑模式是从统一医学语言系统（UMLS）元词表以及给定的训练数据中自动学习得到的。编辑模式是从编辑距离计算中派生出来的编辑的广义序列。编辑模式既有基于字符的，也有基于单词的，并且针对不同的语义类型分别进行学习。除了这些编辑模式外，该系统还通过临床术语中提到的子概念对临床术语进行规范化。

结果

该系统作为2019年n2c2临床术语规范化共享任务第3赛道的一部分进行了评估。在标准测试数据上，它获得了80.79%的准确率。本文包括消融研究，以评估系统不同组件的贡献。当一个临床术语可以规范化为多个概念时，任务中一个具有挑战性的部分是消歧。

结论

所学习的编辑模式使系统在规范化任务中表现良好。鉴于该系统基于模式，它具有人类可解释性，并且还能够深入了解临床文本中提到的与标准化形式不同的临床术语的常见变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86e7/7843202/4cbd1b002d8b/medinform_v9i1e23104_fig1.jpg

相似文献

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.使用学习到的编辑模式和子概念匹配进行临床术语标准化：系统开发与评估

JMIR Med Inform. 2021 Jan 14;9(1):e23104. doi: 10.2196/23104.

Normalizing clinical terms using learned edit distance patterns.使用学习到的编辑距离模式对临床术语进行规范化。

J Am Med Inform Assoc. 2016 Mar;23(2):380-6. doi: 10.1093/jamia/ocv108. Epub 2015 Jul 31.

The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.2019 年全国自然语言处理（NLP）临床挑战（n2c2）/开放健康自然语言处理（OHNLP）临床记录临床概念规范化共享任务。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.

Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking.临床概念规范化的混合自然语言处理系统，结合多层次匹配和机器学习排序。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1576-1584. doi: 10.1093/jamia/ocaa155.

A Hybrid Normalization Method for Medical Concepts in Clinical Narrative using Semantic Matching.一种使用语义匹配的临床叙述中医学概念混合归一化方法

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:732-740. eCollection 2019.

CUILESS2016: a clinical corpus applying compositional normalization of text mentions.CUILESS2016：一个应用文本提及成分归一化的临床语料库。

J Biomed Semantics. 2018 Jan 10;9(1):2. doi: 10.1186/s13326-017-0173-6.

Mapping terms to UMLS concepts of the same semantic type.将术语映射到相同语义类型的统一医学语言系统（UMLS）概念。

AMIA Annu Symp Proc. 2007 Oct 11:1136.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

Coverage of patient safety terms in the UMLS metathesaurus.《统一医学语言系统》元词表中患者安全术语的覆盖情况。

AMIA Annu Symp Proc. 2003;2003:110-4.

Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.基于美国国立医学图书馆统一医学语言系统中的机器学习和概念关系对模糊术语进行自动解析。

J Am Med Inform Assoc. 2002 Nov-Dec;9(6):621-36. doi: 10.1197/jamia.m1101.

引用本文的文献

End-to-end Chinese clinical event extraction based on large language model.基于大语言模型的端到端中文临床事件提取

Sci Rep. 2025 May 15;15(1):16943. doi: 10.1038/s41598-025-00609-y.

Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。

Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.

A simple neural vector space model for medical concept normalization using concept embeddings.使用概念嵌入的医学概念规范化的简单神经向量空间模型。

J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers.来自电子病历的中风结局测量：神经和非神经分类器有效性的横断面研究

JMIR Med Inform. 2021 Nov 1;9(11):e29120. doi: 10.2196/29120.

本文引用的文献

Automatic full conversion of clinical terms into SNOMED CT concepts.临床术语的自动全转换为 SNOMED CT 概念。

J Biomed Inform. 2020 Nov;111:103585. doi: 10.1016/j.jbi.2020.103585. Epub 2020 Oct 2.

J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.

BERT-based Ranking for Biomedical Entity Normalization.基于BERT的生物医学实体规范化排序

AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:269-277. eCollection 2020.

MCN: A comprehensive corpus for medical concept normalization.MCN：用于医学概念规范化的综合语料库。

J Biomed Inform. 2019 Apr;92:103132. doi: 10.1016/j.jbi.2019.103132. Epub 2019 Feb 22.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Clinical information extraction applications: A literature review.临床信息提取应用：文献综述。

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

CNN-based ranking for biomedical entity normalization.基于卷积神经网络的生物医学实体标准化排序

BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):385. doi: 10.1186/s12859-017-1805-7.

Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2.规范首字母缩略词以帮助患者理解临床文本：2013年共享/交叉语言评估论坛电子健康挑战赛，任务2

J Biomed Semantics. 2016 Jul 1;7:43. doi: 10.1186/s13326-016-0084-y.

Normalizing clinical terms using learned edit distance patterns.使用学习到的编辑距离模式对临床术语进行规范化。

J Am Med Inform Assoc. 2016 Mar;23(2):380-6. doi: 10.1093/jamia/ocv108. Epub 2015 Jul 31.

Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。

J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用学习到的编辑模式和子概念匹配进行临床术语标准化：系统开发与评估

Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献