Suppr超能文献

生物医学词义消歧的领域特定词向量评估。

Evaluation of Domain-Specific Word Vectors for Biomedical Word Sense Disambiguation.

机构信息

Chair of Medical Informatics, University Erlangen-Nuremberg, Germany.

出版信息

Stud Health Technol Inform. 2022 May 16;292:23-27. doi: 10.3233/SHTI220314.

Abstract

Among medical applications of natural language processing (NLP), word sense disambiguation (WSD) estimates alternative meanings from text around homonyms. Recently developed NLP methods include word vectors that combine easy computability with nuanced semantic representations. Here we explore the utility of simple linear WSD classifiers based on aggregating word vectors from a modern biomedical NLP library in homonym contexts. We evaluated eight WSD tasks that consider literature abstracts as textual contexts. Discriminative performance was measured in held-out annotations as the median area under sensitivity-specificity curves (AUC) across tasks and 200 bootstrap repetitions. We find that classifiers trained on domain-specific vectors outperformed those from a general language model by 4.0 percentage points, and that a preprocessing step of filtering stopwords and punctuation marks enhanced discrimination by another 0.7 points. The best models achieved a median AUC of 0.992 (interquartile range 0.975 - 0.998). These improvements suggest that more advanced WSD methods might also benefit from leveraging domain-specific vectors derived from large biomedical corpora.

摘要

在自然语言处理(NLP)的医学应用中,词义消歧(WSD)从同形异义词的文本中估计替代含义。最近开发的 NLP 方法包括词向量,它将易于计算和细微的语义表示结合在一起。在这里,我们探索了基于现代生物医学 NLP 库中词向量聚合的简单线性 WSD 分类器在同形异义词上下文中的效用。我们评估了八个 WSD 任务,这些任务将文献摘要作为文本上下文。在保留的注释中,通过在 200 次 bootstrap 重复中测量跨任务的灵敏度-特异性曲线(AUC)的中位数来衡量判别性能。我们发现,基于特定于领域的向量训练的分类器比基于通用语言模型的分类器高出 4.0 个百分点,并且过滤停用词和标点符号的预处理步骤又提高了 0.7 个百分点。最佳模型的中位数 AUC 为 0.992(四分位距 0.975-0.998)。这些改进表明,更先进的 WSD 方法也可能受益于利用从大型生物医学语料库中得出的特定于领域的向量。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验