一种用于在临床记录中检测罕见缩写词用法的新聚类方法。

A new clustering method for detecting rare senses of abbreviations in clinical notes.

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.

出版信息

J Biomed Inform. 2012 Dec;45(6):1075-83. doi: 10.1016/j.jbi.2012.06.003. Epub 2012 Jun 25.

DOI:10.1016/j.jbi.2012.06.003

PMID:22742938

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3729222/

Abstract

Abbreviations are widely used in clinical documents and they are often ambiguous. Building a list of possible senses (also called sense inventory) for each ambiguous abbreviation is the first step to automatically identify correct meanings of abbreviations in given contexts. Clustering based methods have been used to detect senses of abbreviations from a clinical corpus [1]. However, rare senses remain challenging and existing algorithms are not good enough to detect them. In this study, we developed a new two-phase clustering algorithm called Tight Clustering for Rare Senses (TCRS) and applied it to sense generation of abbreviations in clinical text. Using manually annotated sense inventories from a set of 13 ambiguous clinical abbreviations, we evaluated and compared TCRS with the existing Expectation Maximization (EM) clustering algorithm for sense generation, at two different levels of annotation cost (10 vs. 20 instances for each abbreviation). Our results showed that the TCRS-based method could detect 85% senses on average; while the EM-based method found only 75% senses, when similar annotation effort (about 20 instances) was used. Further analysis demonstrated that the improvement by the TCRS method was mainly from additionally detected rare senses, thus indicating its usefulness for building more complete sense inventories of clinical abbreviations.

摘要

缩写在临床文档中被广泛使用，但它们通常具有多义性。为每个歧义缩写词构建可能的含义列表（也称为含义清单）是自动识别给定上下文中缩写词正确含义的第一步。基于聚类的方法已被用于从临床语料库中检测缩写词的含义[1]。然而，罕见的含义仍然具有挑战性，现有的算法还不够好，无法检测到它们。在这项研究中，我们开发了一种称为稀有含义紧密聚类（TCRS）的新两阶段聚类算法，并将其应用于临床文本中缩写词的含义生成。使用 13 个模糊临床缩写词的一组手动注释含义清单，我们评估并比较了 TCRS 与现有的期望最大化（EM）聚类算法在两种不同的注释成本（每个缩写词 10 个实例与 20 个实例）下的含义生成。我们的结果表明，基于 TCRS 的方法平均可以检测到 85%的含义；而基于 EM 的方法在使用类似的注释工作量（每个缩写词约 20 个实例）时仅发现了 75%的含义。进一步的分析表明，TCRS 方法的改进主要来自于额外检测到的罕见含义，这表明它对于构建更完整的临床缩写词含义清单很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a80/3729222/23369f9fc030/nihms389526f1.jpg

相似文献

A new clustering method for detecting rare senses of abbreviations in clinical notes.一种用于在临床记录中检测罕见缩写词用法的新聚类方法。

J Biomed Inform. 2012 Dec;45(6):1075-83. doi: 10.1016/j.jbi.2012.06.003. Epub 2012 Jun 25.

Methods for building sense inventories of abbreviations in clinical notes.构建临床记录中缩写词语义清单的方法。

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.

A study of abbreviations in clinical notes.临床记录中缩写的研究。

AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.

A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources.使用临床笔记和医学词典资源创建的临床缩写和首字母缩略词感知清单。

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):299-307. doi: 10.1136/amiajnl-2012-001506. Epub 2013 Jun 27.

A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD).从冗长表述到简短缩写的漫长历程：开发一个用于临床缩写识别与消歧的开源框架（CARD）

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

Abbreviation and acronym disambiguation in clinical discourse.临床语篇中的缩写词和首字母缩略词消歧

AMIA Annu Symp Proc. 2005;2005:589-93.

A deep database of medical abbreviations and acronyms for natural language processing.用于自然语言处理的医学缩写和首字母缩略词的深度数据库。

Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.结合源自语料库的词义概况与估计的频率信息来消除临床缩写的歧义。

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.基于美国国立医学图书馆统一医学语言系统中的机器学习和概念关系对模糊术语进行自动解析。

J Am Med Inform Assoc. 2002 Nov-Dec;9(6):621-36. doi: 10.1197/jamia.m1101.

引用本文的文献

Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.基于深度学习技术的一刀切分类器在临床缩写中的应用。

Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.

A New Biomedical Passage Retrieval Framework for Laboratory Medicine: Leveraging Domain-specific Ontology, Multilevel PRF, and Negation Differential Weighting.面向检验医学的新型生物医学文献检索框架：利用领域特定本体、多层次 PRF 和否定差异权重。

J Healthc Eng. 2018 Dec 24;2018:3943417. doi: 10.1155/2018/3943417. eCollection 2018.

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e79-e86. doi: 10.1093/jamia/ocw109.

A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究

Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

本文引用的文献

Detecting abbreviations in discharge summaries using machine learning methods.使用机器学习方法检测出院小结中的缩写词。

AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.

Methods for building sense inventories of abbreviations in clinical notes.构建临床记录中缩写词语义清单的方法。

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.

A study of abbreviations in clinical notes.临床记录中缩写的研究。

AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.

Extracting information from textual documents in the electronic health record: a review of recent research.从电子健康记录中的文本文件提取信息：近期研究综述

Yearb Med Inform. 2008:128-44.

Ambiguous abbreviations: an audit of abbreviations in paediatric note keeping.含义不明确的缩写：儿科病历记录中缩写的审查

Arch Dis Child. 2008 Mar;93(3):204-6. doi: 10.1136/adc.2007.128132. Epub 2007 Nov 6.

ADAM: another database of abbreviations in MEDLINE.ADAM：医学在线数据库（MEDLINE）中的另一个缩写词数据库。

Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.

A multi-aspect comparison study of supervised word sense disambiguation.监督式词义消歧的多方面比较研究

J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.

SaRAD: a Simple and Robust Abbreviation Dictionary.SaRAD：一个简单且强大的缩写词典。

Bioinformatics. 2004 Mar 1;20(4):527-33. doi: 10.1093/bioinformatics/btg439. Epub 2004 Jan 22.

Pathology abbreviated: a long review of short terms.病理学缩写：对简短术语的长篇综述。

Arch Pathol Lab Med. 2004 Mar;128(3):347-52. doi: 10.5858/2004-128-347-PAALRO.

"Understanding" medical school curriculum content using KnowledgeMap.使用知识图谱“理解”医学院课程内容。

J Am Med Inform Assoc. 2003 Jul-Aug;10(4):351-62. doi: 10.1197/jamia.M1176. Epub 2003 Mar 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验