Suppr超能文献

临床领域的词义消歧:知识丰富和知识贫乏的无监督方法比较。

Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods.

机构信息

Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Department of Computer Science, University of Massachusetts, Lowell, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2014 Sep-Oct;21(5):842-9. doi: 10.1136/amiajnl-2013-002133. Epub 2014 Jan 17.

Abstract

OBJECTIVE

To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources.

MATERIALS AND METHODS

The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation.

RESULTS

The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic's data, while the graph-based methods only reach the 40-50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help.

DISCUSSION

Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words.

CONCLUSIONS

Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited.

摘要

目的

评估在临床领域的词义消歧(WSD)任务中最新的无监督方法。特别是,比较基于临床知识库的基于图的方法和基于自下而上主题建模的方法。我们研究了几种利用特定于领域的知识源增强主题建模技术的方法。

材料和方法

基于图的方法使用 PageRank 和基于距离的相似性度量的变体,在统一医学语言系统(UMLS)上运行。主题建模方法使用 Multiparameter Intelligent Monitoring in Intensive Care(MIMIC II)数据库中的未标记数据为每个模糊词导出模型。我们研究了使用不同的语言特征对主题模型的影响,包括基于 UMLS 和语法特征。我们使用 Mayo 诊所的标记临床数据集进行评估。

结果

主题建模方法在 Mayo 诊所数据的子集上达到了 66.9%的准确性,而基于图的方法仅达到 40-50%的范围,最常见的感觉基线为 56.5%。从 UMLS 语义类型和概念层次结构中得出的特征在主题模型中没有超过词袋特征的增益,但从 UMLS 识别短语并使用语法确实有帮助。

讨论

尽管主题模型优于基于图的方法,但从 UMLS 中得出的语义特征证明过于嘈杂,无法在词袋之外提高性能。

结论

主题建模在临床领域提供了优越的结果;然而,知识的整合仍然有待有效利用。

相似文献

3
Co-occurrence graphs for word sense disambiguation in the biomedical domain.生物医学领域词义消歧的共现图。
Artif Intell Med. 2018 May;87:9-19. doi: 10.1016/j.artmed.2018.03.002. Epub 2018 Mar 21.
7
Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。
J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.
8
Exploiting domain information for Word Sense Disambiguation of medical documents.利用领域信息进行医学文献的词义消歧。
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40. doi: 10.1136/amiajnl-2011-000415. Epub 2011 Sep 7.
10
Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。
Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.

引用本文的文献

4
A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。
J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.
7
Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入
Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.
8
Semantic annotation in biomedicine: the current landscape.生物医学中的语义标注:现状
J Biomed Semantics. 2017 Sep 22;8(1):44. doi: 10.1186/s13326-017-0153-x.
10
Concept Modeling-based Drug Repositioning.基于概念建模的药物重新定位。
AMIA Jt Summits Transl Sci Proc. 2015 Mar 23;2015:222-6. eCollection 2015.

本文引用的文献

4
Exploiting domain information for Word Sense Disambiguation of medical documents.利用领域信息进行医学文献的词义消歧。
J Am Med Inform Assoc. 2012 Mar-Apr;19(2):235-40. doi: 10.1136/amiajnl-2011-000415. Epub 2011 Sep 7.
7
Graph-based word sense disambiguation of biomedical documents.基于图的生物医学文献词义消歧。
Bioinformatics. 2010 Nov 15;26(22):2889-96. doi: 10.1093/bioinformatics/btq555. Epub 2010 Oct 7.
8
An overview of MetaMap: historical perspective and recent advances.MetaMap 概述:历史视角与最新进展。
J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验