Suppr超能文献

一种用于在临床记录中检测罕见缩写词用法的新聚类方法。

A new clustering method for detecting rare senses of abbreviations in clinical notes.

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.

出版信息

J Biomed Inform. 2012 Dec;45(6):1075-83. doi: 10.1016/j.jbi.2012.06.003. Epub 2012 Jun 25.

Abstract

Abbreviations are widely used in clinical documents and they are often ambiguous. Building a list of possible senses (also called sense inventory) for each ambiguous abbreviation is the first step to automatically identify correct meanings of abbreviations in given contexts. Clustering based methods have been used to detect senses of abbreviations from a clinical corpus [1]. However, rare senses remain challenging and existing algorithms are not good enough to detect them. In this study, we developed a new two-phase clustering algorithm called Tight Clustering for Rare Senses (TCRS) and applied it to sense generation of abbreviations in clinical text. Using manually annotated sense inventories from a set of 13 ambiguous clinical abbreviations, we evaluated and compared TCRS with the existing Expectation Maximization (EM) clustering algorithm for sense generation, at two different levels of annotation cost (10 vs. 20 instances for each abbreviation). Our results showed that the TCRS-based method could detect 85% senses on average; while the EM-based method found only 75% senses, when similar annotation effort (about 20 instances) was used. Further analysis demonstrated that the improvement by the TCRS method was mainly from additionally detected rare senses, thus indicating its usefulness for building more complete sense inventories of clinical abbreviations.

摘要

缩写在临床文档中被广泛使用,但它们通常具有多义性。为每个歧义缩写词构建可能的含义列表(也称为含义清单)是自动识别给定上下文中缩写词正确含义的第一步。基于聚类的方法已被用于从临床语料库中检测缩写词的含义[1]。然而,罕见的含义仍然具有挑战性,现有的算法还不够好,无法检测到它们。在这项研究中,我们开发了一种称为稀有含义紧密聚类(TCRS)的新两阶段聚类算法,并将其应用于临床文本中缩写词的含义生成。使用 13 个模糊临床缩写词的一组手动注释含义清单,我们评估并比较了 TCRS 与现有的期望最大化(EM)聚类算法在两种不同的注释成本(每个缩写词 10 个实例与 20 个实例)下的含义生成。我们的结果表明,基于 TCRS 的方法平均可以检测到 85%的含义;而基于 EM 的方法在使用类似的注释工作量(每个缩写词约 20 个实例)时仅发现了 75%的含义。进一步的分析表明,TCRS 方法的改进主要来自于额外检测到的罕见含义,这表明它对于构建更完整的临床缩写词含义清单很有用。

相似文献

1
A new clustering method for detecting rare senses of abbreviations in clinical notes.
J Biomed Inform. 2012 Dec;45(6):1075-83. doi: 10.1016/j.jbi.2012.06.003. Epub 2012 Jun 25.
2
Methods for building sense inventories of abbreviations in clinical notes.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.
3
A study of abbreviations in clinical notes.
AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.
4
A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources.
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):299-307. doi: 10.1136/amiajnl-2012-001506. Epub 2013 Jun 27.
7
A deep database of medical abbreviations and acronyms for natural language processing.
Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.
8
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.
10
Automatic resolution of ambiguous terms based on machine learning and conceptual relations in the UMLS.
J Am Med Inform Assoc. 2002 Nov-Dec;9(6):621-36. doi: 10.1197/jamia.m1101.

引用本文的文献

1
Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.
Methods Inf Med. 2022 Jun;61(S 01):e28-e34. doi: 10.1055/s-0042-1742388. Epub 2022 Feb 1.
4
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

本文引用的文献

1
Detecting abbreviations in discharge summaries using machine learning methods.
AMIA Annu Symp Proc. 2011;2011:1541-9. Epub 2011 Oct 22.
2
Methods for building sense inventories of abbreviations in clinical notes.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.
3
A study of abbreviations in clinical notes.
AMIA Annu Symp Proc. 2007 Oct 11;2007:821-5.
5
Ambiguous abbreviations: an audit of abbreviations in paediatric note keeping.
Arch Dis Child. 2008 Mar;93(3):204-6. doi: 10.1136/adc.2007.128132. Epub 2007 Nov 6.
6
ADAM: another database of abbreviations in MEDLINE.
Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.
7
A multi-aspect comparison study of supervised word sense disambiguation.
J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.
8
SaRAD: a Simple and Robust Abbreviation Dictionary.
Bioinformatics. 2004 Mar 1;20(4):527-33. doi: 10.1093/bioinformatics/btg439. Epub 2004 Jan 22.
9
Pathology abbreviated: a long review of short terms.
Arch Pathol Lab Med. 2004 Mar;128(3):347-52. doi: 10.5858/2004-128-347-PAALRO.
10
"Understanding" medical school curriculum content using KnowledgeMap.
J Am Med Inform Assoc. 2003 Jul-Aug;10(4):351-62. doi: 10.1197/jamia.M1176. Epub 2003 Mar 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验