Suppr超能文献

一种用于在临床记录中检测罕见缩写词用法的新聚类方法。

A new clustering method for detecting rare senses of abbreviations in clinical notes.

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.

出版信息

J Biomed Inform. 2012 Dec;45(6):1075-83. doi: 10.1016/j.jbi.2012.06.003. Epub 2012 Jun 25.

Abstract

Abbreviations are widely used in clinical documents and they are often ambiguous. Building a list of possible senses (also called sense inventory) for each ambiguous abbreviation is the first step to automatically identify correct meanings of abbreviations in given contexts. Clustering based methods have been used to detect senses of abbreviations from a clinical corpus [1]. However, rare senses remain challenging and existing algorithms are not good enough to detect them. In this study, we developed a new two-phase clustering algorithm called Tight Clustering for Rare Senses (TCRS) and applied it to sense generation of abbreviations in clinical text. Using manually annotated sense inventories from a set of 13 ambiguous clinical abbreviations, we evaluated and compared TCRS with the existing Expectation Maximization (EM) clustering algorithm for sense generation, at two different levels of annotation cost (10 vs. 20 instances for each abbreviation). Our results showed that the TCRS-based method could detect 85% senses on average; while the EM-based method found only 75% senses, when similar annotation effort (about 20 instances) was used. Further analysis demonstrated that the improvement by the TCRS method was mainly from additionally detected rare senses, thus indicating its usefulness for building more complete sense inventories of clinical abbreviations.

摘要

缩写在临床文档中被广泛使用,但它们通常具有多义性。为每个歧义缩写词构建可能的含义列表(也称为含义清单)是自动识别给定上下文中缩写词正确含义的第一步。基于聚类的方法已被用于从临床语料库中检测缩写词的含义[1]。然而,罕见的含义仍然具有挑战性,现有的算法还不够好,无法检测到它们。在这项研究中,我们开发了一种称为稀有含义紧密聚类(TCRS)的新两阶段聚类算法,并将其应用于临床文本中缩写词的含义生成。使用 13 个模糊临床缩写词的一组手动注释含义清单,我们评估并比较了 TCRS 与现有的期望最大化(EM)聚类算法在两种不同的注释成本(每个缩写词 10 个实例与 20 个实例)下的含义生成。我们的结果表明,基于 TCRS 的方法平均可以检测到 85%的含义;而基于 EM 的方法在使用类似的注释工作量(每个缩写词约 20 个实例)时仅发现了 75%的含义。进一步的分析表明,TCRS 方法的改进主要来自于额外检测到的罕见含义,这表明它对于构建更完整的临床缩写词含义清单很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a80/3729222/23369f9fc030/nihms389526f1.jpg

相似文献

2
Methods for building sense inventories of abbreviations in clinical notes.构建临床记录中缩写词语义清单的方法。
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.
8
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

本文引用的文献

2
Methods for building sense inventories of abbreviations in clinical notes.构建临床记录中缩写词语义清单的方法。
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):103-8. doi: 10.1197/jamia.M2927. Epub 2008 Oct 24.
6
ADAM: another database of abbreviations in MEDLINE.ADAM:医学在线数据库(MEDLINE)中的另一个缩写词数据库。
Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.
7
A multi-aspect comparison study of supervised word sense disambiguation.监督式词义消歧的多方面比较研究
J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.
8
SaRAD: a Simple and Robust Abbreviation Dictionary.SaRAD:一个简单且强大的缩写词典。
Bioinformatics. 2004 Mar 1;20(4):527-33. doi: 10.1093/bioinformatics/btg439. Epub 2004 Jan 22.
9
Pathology abbreviated: a long review of short terms.病理学缩写:对简短术语的长篇综述。
Arch Pathol Lab Med. 2004 Mar;128(3):347-52. doi: 10.5858/2004-128-347-PAALRO.
10
"Understanding" medical school curriculum content using KnowledgeMap.使用知识图谱“理解”医学院课程内容。
J Am Med Inform Assoc. 2003 Jul-Aug;10(4):351-62. doi: 10.1197/jamia.M1176. Epub 2003 Mar 28.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验