Suppr超能文献

基于双向长短期记忆和注意力机制的神经网络的生物医学词义消歧。

Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.

机构信息

Department of Mathematics, Florida State University, Tallahassee, FL, US.

Department of Computer Science, Florida State University, Tallahassee, FL, US.

出版信息

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.

Abstract

BACKGROUND

In recent years, deep learning methods have been applied to many natural language processing tasks to achieve state-of-the-art performance. However, in the biomedical domain, they have not out-performed supervised word sense disambiguation (WSD) methods based on support vector machines or random forests, possibly due to inherent similarities of medical word senses.

RESULTS

In this paper, we propose two deep-learning-based models for supervised WSD: a model based on bi-directional long short-term memory (BiLSTM) network, and an attention model based on self-attention architecture. Our result shows that the BiLSTM neural network model with a suitable upper layer structure performs even better than the existing state-of-the-art models on the MSH WSD dataset, while our attention model was 3 or 4 times faster than our BiLSTM model with good accuracy. In addition, we trained "universal" models in order to disambiguate all ambiguous words together. That is, we concatenate the embedding of the target ambiguous word to the max-pooled vector in the universal models, acting as a "hint". The result shows that our universal BiLSTM neural network model yielded about 90 percent accuracy.

CONCLUSION

Deep contextual models based on sequential information processing methods are able to capture the relative contextual information from pre-trained input word embeddings, in order to provide state-of-the-art results for supervised biomedical WSD tasks.

摘要

背景

近年来,深度学习方法已被应用于许多自然语言处理任务中,以达到最新的性能水平。然而,在生物医学领域,它们并未超越基于支持向量机或随机森林的监督词义消歧(WSD)方法,这可能是由于医学词义之间存在内在的相似性。

结果

在本文中,我们提出了两种基于深度学习的监督 WSD 模型:一种基于双向长短期记忆(BiLSTM)网络的模型,另一种基于自注意力架构的注意力模型。我们的结果表明,在 MSH WSD 数据集上,具有合适上层结构的 BiLSTM 神经网络模型的性能甚至优于现有的最先进模型,而我们的注意力模型的准确性与 BiLSTM 模型相当,但速度要快 3 到 4 倍。此外,我们还训练了“通用”模型,以便一起消歧所有歧义词。也就是说,我们将目标歧义词的嵌入与通用模型中的最大池化向量连接起来,作为一个“提示”。结果表明,我们的通用 BiLSTM 神经网络模型的准确率约为 90%。

结论

基于顺序信息处理方法的深度上下文模型能够从预训练的输入词嵌入中捕获相对上下文信息,从而为监督生物医学 WSD 任务提供最新的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验