• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于分布和关系上下文的增强词表示法进行生物医学文本分类

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.

作者信息

Parwez Md Aslam, Fazil Mohd, Arif Muhammad, Nafis Md Tabrez, Auwul Md Rabiul

机构信息

Department of Computer Science & Engineering, Jamia Hamdard, New Delhi, India.

University of Limerick, Limerick, Ireland.

出版信息

Comput Intell Neurosci. 2023 Feb 15;2023:2989791. doi: 10.1155/2023/2989791. eCollection 2023.

DOI:10.1155/2023/2989791
PMID:39262497
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11390191/
Abstract

Due to the increasing use of information technologies by biomedical experts, researchers, public health agencies, and healthcare professionals, a large number of scientific literatures, clinical notes, and other structured and unstructured text resources are rapidly increasing and being stored in various data sources like PubMed. These massive text resources can be leveraged to extract valuable knowledge and insights using machine learning techniques. Recent advancement in neural network-based classification models has gained popularity which takes numeric vectors () of training data as the input to train classification models. Better the input vectors, more accurate would be the classification. Word representations are learned as the distribution of words in an embedding space, wherein each word has its vector and the semantically similar words based on the contexts appear nearby each other. However, such distributional word representations are incapable of encapsulating relational semantics between distant words. In the biomedical domain, is a well-studied problem which aims to extract relational words, which associates distant entities generally representing the subject and object of a sentence. Our goal is to capture the relational semantics information between distant words from a large corpus to learn enhanced word representation and employ the learned word representation for various natural language processing tasks such as text classification. In this article, we have proposed an application of biomedical relation triplets to learn word representation through incorporating relational semantic information within the distributional representation of words. In other words, the proposed approach aims to capture both distributional and relational contexts of the words to learn their numeric vectors from text corpus. We have also proposed an application of the learned word representations for text classification. The proposed approach is evaluated over multiple benchmark datasets, and the efficacy of the learned word representations is tested in terms of and tasks. Our proposed approach provides better performance in comparison to the state-of-the-art GloVe model. Furthermore, we have applied the learned word representations to classify biomedical texts using four neural network-based classification models, and the classification accuracy further confirms the effectiveness of the learned word representations by our proposed approach.

摘要

由于生物医学专家、研究人员、公共卫生机构和医疗保健专业人员对信息技术的使用不断增加,大量的科学文献、临床记录以及其他结构化和非结构化文本资源正在迅速增加,并存储在诸如PubMed等各种数据源中。这些海量文本资源可利用机器学习技术来提取有价值的知识和见解。基于神经网络的分类模型的最新进展颇受关注,该模型将训练数据的数值向量()作为输入来训练分类模型。输入向量越好,分类就越准确。词表示是作为词在嵌入空间中的分布来学习的,其中每个词都有其向量,并且基于上下文语义相似的词会出现在彼此附近。然而,这种分布式词表示无法封装远距离词之间的关系语义。在生物医学领域,是一个经过充分研究的问题,旨在提取关系词,这些关系词将通常代表句子主语和宾语的远距离实体关联起来。我们的目标是从大型语料库中捕捉远距离词之间的关系语义信息,以学习增强的词表示,并将学习到的词表示用于各种自然语言处理任务,如文本分类。在本文中,我们提出了一种生物医学关系三元组的应用,通过将关系语义信息纳入词的分布式表示中来学习词表示。换句话说,所提出的方法旨在捕捉词的分布式和关系上下文,以便从文本语料库中学习它们的数值向量。我们还提出了将学习到的词表示应用于文本分类。所提出的方法在多个基准数据集上进行了评估,并在和任务方面测试了学习到的词表示的功效。与最先进的GloVe模型相比,我们提出的方法具有更好的性能。此外,我们已将学习到的词表示应用于使用四个基于神经网络的分类模型对生物医学文本进行分类,分类准确率进一步证实了我们提出的方法所学习到的词表示的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3994/11390191/cfa751e9364d/CIN2023-2989791.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3994/11390191/b7ce06063e11/CIN2023-2989791.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3994/11390191/cfa751e9364d/CIN2023-2989791.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3994/11390191/b7ce06063e11/CIN2023-2989791.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3994/11390191/cfa751e9364d/CIN2023-2989791.002.jpg

相似文献

1
Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.基于分布和关系上下文的增强词表示法进行生物医学文本分类
Comput Intell Neurosci. 2023 Feb 15;2023:2989791. doi: 10.1155/2023/2989791. eCollection 2023.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.通过融合局部和全局上下文进行无监督词嵌入学习
Front Big Data. 2020 Mar 11;3:9. doi: 10.3389/fdata.2020.00009. eCollection 2020.
4
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.
5
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
6
Deep Artificial Neural Networks Reveal a Distributed Cortical Network Encoding Propositional Sentence-Level Meaning.深度人工神经网络揭示命题句级意义的分布式皮层网络编码。
J Neurosci. 2021 May 5;41(18):4100-4119. doi: 10.1523/JNEUROSCI.1152-20.2021. Epub 2021 Mar 22.
7
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
8
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.
9
Exploring What Is Encoded in Distributional Word Vectors: A Neurobiologically Motivated Analysis.探索分布词向量中编码的内容:一种受神经生物学启发的分析。
Cogn Sci. 2020 Jun;44(6):e12844. doi: 10.1111/cogs.12844.
10
Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。
J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.

引用本文的文献

1
MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.医学主题词表到矩阵:基于PubMed结合医学主题词表关键词与机器学习进行生物医学关系分类
J Biomed Semantics. 2024 Oct 2;15(1):18. doi: 10.1186/s13326-024-00319-w.

本文引用的文献

1
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。
BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.
2
A Complete Process of Text Classification System Using State-of-the-Art NLP Models.使用最先进的自然语言处理模型的文本分类系统的完整流程。
Comput Intell Neurosci. 2022 Jun 9;2022:1883698. doi: 10.1155/2022/1883698. eCollection 2022.
3
Blockchain-Based Trust Management Framework for Cloud Computing-Based Internet of Medical Things (IoMT): A Systematic Review.
基于区块链的云计算物联网 (IoMT) 信任管理框架:系统评价。
Comput Intell Neurosci. 2022 May 19;2022:9766844. doi: 10.1155/2022/9766844. eCollection 2022.
4
Bio-Imaging-Based Machine Learning Algorithm for Breast Cancer Detection.基于生物成像的乳腺癌检测机器学习算法
Diagnostics (Basel). 2022 May 3;12(5):1134. doi: 10.3390/diagnostics12051134.
5
DiseaSE: A biomedical text analytics system for disease symptom extraction and characterization.疾病:用于疾病症状提取和特征描述的生物医学文本分析系统。
J Biomed Inform. 2019 Dec;100:103324. doi: 10.1016/j.jbi.2019.103324. Epub 2019 Oct 31.
6
Neural network-based approaches for biomedical relation classification: A review.基于神经网络的生物医学关系分类方法:综述。
J Biomed Inform. 2019 Nov;99:103294. doi: 10.1016/j.jbi.2019.103294. Epub 2019 Sep 23.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
BioWordVec, improving biomedical word embeddings with subword information and MeSH.BioWordVec,利用子词信息和 MeSH 改进生物医学词向量。
Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.
9
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
10
Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine.生物模拟动词和生物模拟词汇:生物医学中词汇相似度的广泛覆盖评估集。
BMC Bioinformatics. 2018 Feb 5;19(1):33. doi: 10.1186/s12859-018-2039-z.