深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

机构信息

Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada.

Institute for Big Data Analytics, Dalhousie University, Halifax, NS B3H 4R2, Canada.

出版信息

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

DOI:10.1093/jamia/ocy189

PMID:30811548

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7787358/

Abstract

OBJECTIVE

In biomedicine, there is a wealth of information hidden in unstructured narratives such as research articles and clinical reports. To exploit these data properly, a word sense disambiguation (WSD) algorithm prevents downstream difficulties in the natural language processing applications pipeline. Supervised WSD algorithms largely outperform un- or semisupervised and knowledge-based methods; however, they train 1 separate classifier for each ambiguous term, necessitating a large number of expert-labeled training data, an unattainable goal in medical informatics. To alleviate this need, a single model that shares statistical strength across all instances and scales well with the vocabulary size is desirable.

MATERIALS AND METHODS

Built on recent advances in deep learning, our deepBioWSD model leverages 1 single bidirectional long short-term memory network that makes sense prediction for any ambiguous term. In the model, first, the Unified Medical Language System sense embeddings will be computed using their text definitions; and then, after initializing the network with these embeddings, it will be trained on all (available) training data collectively. This method also considers a novel technique for automatic collection of training data from PubMed to (pre)train the network in an unsupervised manner.

RESULTS

We use the MSH WSD dataset to compare WSD algorithms, with macro and micro accuracies employed as evaluation metrics. deepBioWSD outperforms existing models in biomedical text WSD by achieving the state-of-the-art performance of 96.82% for macro accuracy.

CONCLUSIONS

Apart from the disambiguation improvement and unsupervised training, deepBioWSD depends on considerably less number of expert-labeled data as it learns the target and the context terms jointly. These merit deepBioWSD to be conveniently deployable in real-time biomedical applications.

摘要

目的

在生物医学领域，研究文章和临床报告等非结构化叙述中隐藏着大量信息。为了正确利用这些数据，词位消歧（WSD）算法可防止自然语言处理应用程序管道下游出现困难。监督 WSD 算法在很大程度上优于非监督和半监督以及基于知识的方法；但是，它们为每个歧义术语训练 1 个单独的分类器，这需要大量专家标记的训练数据，这在医学信息学中是无法实现的目标。为了减轻这种需求，希望有一种单一的模型，该模型可以在所有实例中共享统计强度，并且可以很好地扩展词汇量。

材料与方法

基于深度学习的最新进展，我们的 deepBioWSD 模型利用单个双向长短期记忆网络为任何歧义术语进行语义预测。在该模型中，首先，将使用其文本定义计算统一医学语言系统的语义嵌入；然后，在使用这些嵌入初始化网络之后，将通过集体使用所有（可用）训练数据对其进行训练。该方法还考虑了一种从 PubMed 自动收集训练数据的新技术，以便以无监督的方式对网络进行预训练。

结果

我们使用 MSH WSD 数据集来比较 WSD 算法，采用宏和微精度作为评估指标。deepBioWSD 在生物医学文本 WSD 中的表现优于现有模型，宏观精度达到 96.82%的最新性能。

结论

除了改进消歧和无监督训练外，deepBioWSD 还依靠相对较少数量的专家标记数据，因为它可以共同学习目标和上下文术语。这些优点使 deepBioWSD 可以方便地部署在实时生物医学应用中。

相似文献

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

Word embeddings and recurrent neural networks based on Long-Short Term Memory nodes in supervised biomedical word sense disambiguation.基于长短期记忆节点的词嵌入和循环神经网络在有监督生物医学词义消歧中的应用

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks.基于双向长短期记忆和注意力机制的神经网络的生物医学词义消歧。

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):502. doi: 10.1186/s12859-019-3079-8.

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.基于知识的生物医学词义消歧：评估及在临床文档分类中的应用。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):882-6. doi: 10.1136/amiajnl-2012-001350. Epub 2012 Oct 16.

Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。

J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.

Collocation analysis for UMLS knowledge-based word sense disambiguation.基于 UMLS 的词汇搭配分析在词义消歧中的应用。

BMC Bioinformatics. 2011 Jun 9;12 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2105-12-S3-S4.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.生物医学领域中的机器学习与词义消歧：设计与评估问题

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation.利用 MEDLINE 中的 MeSH 索引生成用于词义消歧的数据集合。

BMC Bioinformatics. 2011 Jun 2;12:223. doi: 10.1186/1471-2105-12-223.

Word Sense Disambiguation of Medical Terms via Recurrent Convolutional Neural Networks.基于循环卷积神经网络的医学术语词义消歧

Stud Health Technol Inform. 2017;236:8-15.

Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

引用本文的文献

Disambiguating Clinical Abbreviations by One-to-All Classification: Algorithm Development and Validation Study.通过一对一分类法对临床缩写进行消歧：算法开发和验证研究。

JMIR Med Inform. 2024 Oct 1;12:e56955. doi: 10.2196/56955.

Classification of neurologic outcomes from medical notes using natural language processing.使用自然语言处理技术从医学记录中对神经学结果进行分类。

Expert Syst Appl. 2023 Mar 15;214. doi: 10.1016/j.eswa.2022.119171. Epub 2022 Nov 6.

Deciphering clinical abbreviations with a privacy protecting machine learning system.使用具有隐私保护功能的机器学习系统破译临床缩写。

Nat Commun. 2022 Dec 2;13(1):7456. doi: 10.1038/s41467-022-35007-9.

Implementing Machine Learning in Interventional Cardiology: The Benefits Are Worth the Trouble.在介入心脏病学中应用机器学习：益处值得付出努力。

Front Cardiovasc Med. 2021 Dec 8;8:711401. doi: 10.3389/fcvm.2021.711401. eCollection 2021.

Automatically disambiguating medical acronyms with ontology-aware deep learning.基于本体感知深度学习的医学缩略语自动消歧

Nat Commun. 2021 Sep 7;12(1):5319. doi: 10.1038/s41467-021-25578-4.

Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.利用语义类型预测和大规模数据集提高全面的医学实体链接。

J Biomed Inform. 2021 Sep;121:103880. doi: 10.1016/j.jbi.2021.103880. Epub 2021 Aug 12.

A deep database of medical abbreviations and acronyms for natural language processing.用于自然语言处理的医学缩写和首字母缩略词的深度数据库。

Sci Data. 2021 Jun 2;8(1):149. doi: 10.1038/s41597-021-00929-4.

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.医学概念规范化中的歧义：电子健康记录数据集的类型和覆盖范围分析。

J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. doi: 10.1093/jamia/ocaa269.

Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测

Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications.统一医学语言系统专家词典及词汇工具：开发与应用

J Am Med Inform Assoc. 2020 May 29;27(10):1600-5. doi: 10.1093/jamia/ocaa056.

本文引用的文献

Conversational agents in healthcare: a systematic review.医疗保健中的会话代理：系统评价。

J Am Med Inform Assoc. 2018 Sep 1;25(9):1248-1258. doi: 10.1093/jamia/ocy072.

Interactive medical word sense disambiguation through informed learning.通过知情学习进行交互式医学词义消歧。

J Am Med Inform Assoc. 2018 Jul 1;25(7):800-808. doi: 10.1093/jamia/ocy013.

Co-occurrence graphs for word sense disambiguation in the biomedical domain.生物医学领域词义消歧的共现图。

Artif Intell Med. 2018 May;87:9-19. doi: 10.1016/j.artmed.2018.03.002. Epub 2018 Mar 21.

Application of Text Information Extraction System for Real-Time Cancer Case Identification in an Integrated Healthcare Organization.文本信息提取系统在综合医疗保健机构中用于实时癌症病例识别的应用。

J Pathol Inform. 2017 Dec 14;8:48. doi: 10.4103/jpi.jpi_55_17. eCollection 2017.

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入

Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation.应用于生物医学词义消歧的监督学习和基于知识的方法。

J Integr Bioinform. 2017 Dec 13;14(4):/j/jib.2017.14.issue-4/jib-2017-0051/jib-2017-0051.xml. doi: 10.1515/jib-2017-0051.

Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure.利用生物医学文献发现药物不良事件：一场大数据神经网络的探索之旅。

JMIR Med Inform. 2017 Dec 8;5(4):e51. doi: 10.2196/medinform.9170.

Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

J Biomed Inform. 2017 Sep;73:137-147. doi: 10.1016/j.jbi.2017.08.001. Epub 2017 Aug 7.

Clinical Word Sense Disambiguation with Interactive Search and Classification.基于交互式搜索与分类的临床词义消歧

AMIA Annu Symp Proc. 2017 Feb 10;2016:2062-2071. eCollection 2016.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验