临床领域的词义消歧：知识丰富和知识贫乏的无监督方法比较。

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.

机构信息

Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Department of Computer Science, University of Massachusetts, Lowell, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

DOI:10.1136/amiajnl-2013-002133

PMID:24441986

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4147600/

Abstract

OBJECTIVE

To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources.

MATERIALS AND METHODS

The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation.

RESULTS

The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic's data, while the graph-based methods only reach the 40-50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help.

DISCUSSION

Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words.

CONCLUSIONS

Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited.

摘要

目的

评估在临床领域的词义消歧（WSD）任务中最新的无监督方法。特别是，比较基于临床知识库的基于图的方法和基于自下而上主题建模的方法。我们研究了几种利用特定于领域的知识源增强主题建模技术的方法。

材料和方法

基于图的方法使用 PageRank 和基于距离的相似性度量的变体，在统一医学语言系统（UMLS）上运行。主题建模方法使用 Multiparameter Intelligent Monitoring in Intensive Care（MIMIC II）数据库中的未标记数据为每个模糊词导出模型。我们研究了使用不同的语言特征对主题模型的影响，包括基于 UMLS 和语法特征。我们使用 Mayo 诊所的标记临床数据集进行评估。

结果

主题建模方法在 Mayo 诊所数据的子集上达到了 66.9％的准确性，而基于图的方法仅达到 40-50％的范围，最常见的感觉基线为 56.5％。从 UMLS 语义类型和概念层次结构中得出的特征在主题模型中没有超过词袋特征的增益，但从 UMLS 识别短语并使用语法确实有帮助。

讨论

尽管主题模型优于基于图的方法，但从 UMLS 中得出的语义特征证明过于嘈杂，无法在词袋之外提高性能。

结论

主题建模在临床领域提供了优越的结果；然而，知识的整合仍然有待有效利用。

相似文献

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.临床领域的词义消歧：知识丰富和知识贫乏的无监督方法比较。

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Co-occurrence graphs for word sense disambiguation in the biomedical domain.生物医学领域词义消歧的共现图。

Artif Intell Med. 2018 May;87:9-19. doi: 10.1016/j.artmed.2018.03.002. Epub 2018 Mar 21.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Exploiting domain information for Word Sense Disambiguation of medical documents.利用领域信息进行医学文献的词义消歧。

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40. doi: 10.1136/amiajnl-2011-000415. Epub 2011 Sep 7.

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.研究不同词义消歧方法与生物医学文本摘要有效性之间的相关性。

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

引用本文的文献

Application of artificial intelligence to corelate food formulations to disease risk prediction: a comprehensive review.人工智能在将食品配方与疾病风险预测相关联方面的应用：一项综述。

J Food Sci Technol. 2023 Sep;60(9):2350-2357. doi: 10.1007/s13197-022-05550-w. Epub 2022 Jul 18.

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.医学概念规范化中的歧义：电子健康记录数据集的类型和覆盖范围分析。

J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. doi: 10.1093/jamia/ocaa269.

Complexities, variations, and errors of numbering within clinical notes: the potential impact on information extraction and cohort-identification.临床记录中编号的复杂性、变化性和错误：对信息提取和队列识别的潜在影响。

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):75. doi: 10.1186/s12911-019-0784-1.

A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

Distinction between medical and non-medical usages of short forms in clinical narratives.临床记录中缩写词医学用法与非医学用法的区分。

AMIA Annu Symp Proc. 2018 Apr 16;2017:1302-1311. eCollection 2017.

A bibliometric analysis of natural language processing in medical research.自然语言处理在医学研究中的文献计量分析。

BMC Med Inform Decis Mak. 2018 Mar 22;18(Suppl 1):14. doi: 10.1186/s12911-018-0594-x.

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入

Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.

Semantic annotation in biomedicine: the current landscape.生物医学中的语义标注：现状

J Biomed Semantics. 2017 Sep 22;8(1):44. doi: 10.1186/s13326-017-0153-x.

Trends in biomedical informatics: automated topic analysis of JAMIA articles.生物医学信息学趋势：《美国医学信息学会杂志》文章的自动主题分析

J Am Med Inform Assoc. 2015 Nov;22(6):1153-63. doi: 10.1093/jamia/ocv157.

Concept Modeling-based Drug Repositioning.基于概念建模的药物重新定位。

AMIA Jt Summits Transl Sci Proc. 2015 Mar 23;2015:222-6. eCollection 2015.

本文引用的文献

Automated disambiguation of acronyms and abbreviations in clinical texts: window and training size considerations.临床文本中首字母缩略词和缩写词的自动消歧：窗口与训练规模考量

AMIA Annu Symp Proc. 2012;2012:1310-9. Epub 2012 Nov 3.

Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.结合源自语料库的词义概况与估计的频率信息来消除临床缩写的歧义。

AMIA Annu Symp Proc. 2012;2012:1004-13. Epub 2012 Nov 3.

Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity.基于知识的方法，利用相似性的信息内容度量来确定模糊生物医学术语的含义。

AMIA Annu Symp Proc. 2011;2011:895-904. Epub 2011 Oct 22.

Exploiting domain information for Word Sense Disambiguation of medical documents.利用领域信息进行医学文献的词义消歧。

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40. doi: 10.1136/amiajnl-2011-000415. Epub 2011 Sep 7.

Multiparameter Intelligent Monitoring in Intensive Care II: a public-access intensive care unit database.多参数智能监护在重症监护中的应用 II：一个公共接入重症监护病房数据库。

Crit Care Med. 2011 May;39(5):952-60. doi: 10.1097/CCM.0b013e31820a92c6.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.基于知识的生物医学词义消歧：方法比较。

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。

Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

An overview of MetaMap: historical perspective and recent advances.MetaMap 概述：历史视角与最新进展。

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Word Sense Disambiguation by Selecting the Best Semantic Type Based on Journal Descriptor Indexing: Preliminary Experiment.基于期刊描述符索引选择最佳语义类型的词义消歧：初步实验

J Am Soc Inf Sci Technol. 2006 Jan 1;57(1):96-113. doi: 10.1002/asi.20257.

Extracting information from textual documents in the electronic health record: a review of recent research.从电子健康记录中的文本文件提取信息：近期研究综述

Yearb Med Inform. 2008:128-44.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验