生物医学词义消歧的领域特定词向量评估。

Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation.

机构信息

Chair of Medical Informatics, University Erlangen-Nuremberg, Germany.

出版信息

Stud Health Technol Inform. 2022 May 16;292:23-27. doi: 10.3233/SHTI220314.

DOI:10.3233/SHTI220314

Abstract

Among medical applications of natural language processing (NLP), word sense disambiguation (WSD) estimates alternative meanings from text around homonyms. Recently developed NLP methods include word vectors that combine easy computability with nuanced semantic representations. Here we explore the utility of simple linear WSD classifiers based on aggregating word vectors from a modern biomedical NLP library in homonym contexts. We evaluated eight WSD tasks that consider literature abstracts as textual contexts. Discriminative performance was measured in held-out annotations as the median area under sensitivity-specificity curves (AUC) across tasks and 200 bootstrap repetitions. We find that classifiers trained on domain-specific vectors outperformed those from a general language model by 4.0 percentage points, and that a preprocessing step of filtering stopwords and punctuation marks enhanced discrimination by another 0.7 points. The best models achieved a median AUC of 0.992 (interquartile range 0.975 - 0.998). These improvements suggest that more advanced WSD methods might also benefit from leveraging domain-specific vectors derived from large biomedical corpora.

摘要

在自然语言处理（NLP）的医学应用中，词义消歧（WSD）从同形异义词的文本中估计替代含义。最近开发的 NLP 方法包括词向量，它将易于计算和细微的语义表示结合在一起。在这里，我们探索了基于现代生物医学 NLP 库中词向量聚合的简单线性 WSD 分类器在同形异义词上下文中的效用。我们评估了八个 WSD 任务，这些任务将文献摘要作为文本上下文。在保留的注释中，通过在 200 次 bootstrap 重复中测量跨任务的灵敏度-特异性曲线（AUC）的中位数来衡量判别性能。我们发现，基于特定于领域的向量训练的分类器比基于通用语言模型的分类器高出 4.0 个百分点，并且过滤停用词和标点符号的预处理步骤又提高了 0.7 个百分点。最佳模型的中位数 AUC 为 0.992（四分位距 0.975-0.998）。这些改进表明，更先进的 WSD 方法也可能受益于利用从大型生物医学语料库中得出的特定于领域的向量。

相似文献

Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation.生物医学词义消歧的领域特定词向量评估。

Stud Health Technol Inform. 2022 May 16;292:23-27. doi: 10.3233/SHTI220314.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text.评估语义相似性和关联性的度量标准，以消除生物医学文本中的术语歧义。

J Biomed Inform. 2013 Dec;46(6):1116-24. doi: 10.1016/j.jbi.2013.08.008. Epub 2013 Sep 4.

Developing a test collection for biomedical word sense disambiguation.开发用于生物医学词义消歧的测试集。

Proc AMIA Symp. 2001:746-50.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物医学词义消歧的领域特定词向量评估。

Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献