• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于词库的词嵌入用于自动化生物医学文献分类。

Thesaurus-based word embeddings for automated biomedical literature classification.

作者信息

Koutsomitropoulos Dimitrios A, Andriopoulos Andreas D

机构信息

Department of Computer Engineering and Informatics, School of Engineering, University of Patras, Patras, Greece.

出版信息

Neural Comput Appl. 2022;34(2):937-950. doi: 10.1007/s00521-021-06053-z. Epub 2021 May 11.

DOI:10.1007/s00521-021-06053-z
PMID:33994670
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8111057/
Abstract

The special nature, volume and broadness of biomedical literature pose barriers for automated classification methods. On the other hand, manually indexing is time-consuming, costly and error prone. We argue that current word embedding algorithms can be efficiently used to support the task of biomedical text classification even in a multilabel setting, with many distinct labels. The ontology representation of Medical Subject Headings provides machine-readable labels and specifies the dimensionality of the problem space. Both deep- and shallow network approaches are implemented. Predictions are determined by the similarity between extracted features from contextualized representations of abstracts and headings. The addition of a separate classifier for transfer learning is also proposed and evaluated. Large datasets of biomedical citations are harvested for their metadata and used for training and testing. These automated approaches are still far from entirely substituting human experts, yet they can be useful as a mechanism for validation and recommendation. Dataset balancing, distributed processing and training parallelization in GPUs, all play an important part regarding the effectiveness and performance of proposed methods.

摘要

生物医学文献的特殊性质、数量和广度给自动分类方法带来了障碍。另一方面,人工索引既耗时又昂贵,还容易出错。我们认为,即使在多标签设置且有许多不同标签的情况下,当前的词嵌入算法也可以有效地用于支持生物医学文本分类任务。医学主题词表的本体表示提供了机器可读的标签,并指定了问题空间的维度。同时实现了深度和浅层网络方法。预测由从摘要和标题的上下文表示中提取的特征之间的相似度决定。还提出并评估了用于迁移学习的单独分类器。收集大量生物医学文献引用数据集的元数据,并将其用于训练和测试。这些自动化方法仍远不能完全替代人类专家,但它们可作为一种验证和推荐机制发挥作用。数据集平衡、分布式处理以及GPU中的训练并行化,对于所提出方法的有效性和性能都起着重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/1031526bbaf1/521_2021_6053_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/187fb15bb15e/521_2021_6053_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/3bed17c5255e/521_2021_6053_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/7b3571fe9461/521_2021_6053_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/37cb67d8fe17/521_2021_6053_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/2634904c9892/521_2021_6053_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/47445ed71e7a/521_2021_6053_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/1db94c56421a/521_2021_6053_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/f5b0a702deae/521_2021_6053_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/7d3162968df1/521_2021_6053_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/4f013b31ed9f/521_2021_6053_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/131a6ccb733e/521_2021_6053_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/1031526bbaf1/521_2021_6053_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/187fb15bb15e/521_2021_6053_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/3bed17c5255e/521_2021_6053_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/7b3571fe9461/521_2021_6053_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/37cb67d8fe17/521_2021_6053_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/2634904c9892/521_2021_6053_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/47445ed71e7a/521_2021_6053_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/1db94c56421a/521_2021_6053_Figb_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/f5b0a702deae/521_2021_6053_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/7d3162968df1/521_2021_6053_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/4f013b31ed9f/521_2021_6053_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/131a6ccb733e/521_2021_6053_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ae5/8111057/1031526bbaf1/521_2021_6053_Fig10_HTML.jpg

相似文献

1
Thesaurus-based word embeddings for automated biomedical literature classification.基于词库的词嵌入用于自动化生物医学文献分类。
Neural Comput Appl. 2022;34(2):937-950. doi: 10.1007/s00521-021-06053-z. Epub 2021 May 11.
2
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.评估浅层和深度学习策略在 2018 n2c2 临床文本分类共享任务中的应用。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149.
3
Multi-Ontology Refined Embeddings (MORE): A hybrid multi-ontology and corpus-based semantic representation model for biomedical concepts.多本体精炼嵌入模型(MORE):一种基于混合多本体和语料库的生物医学概念语义表示模型。
J Biomed Inform. 2020 Nov;111:103581. doi: 10.1016/j.jbi.2020.103581. Epub 2020 Oct 1.
4
Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.利用本体和元数据进行生物医学词义消歧:自动化与准确性的结合。
BMC Bioinformatics. 2009 Jan 21;10:28. doi: 10.1186/1471-2105-10-28.
5
Improved biomedical word embeddings in the transformer era.Transformer 时代改进的生物医学词向量。
J Biomed Inform. 2021 Aug;120:103867. doi: 10.1016/j.jbi.2021.103867. Epub 2021 Jul 18.
6
Comparing general and specialized word embeddings for biomedical named entity recognition.比较用于生物医学命名实体识别的通用词嵌入和专用词嵌入。
PeerJ Comput Sci. 2021 Feb 18;7:e384. doi: 10.7717/peerj-cs.384. eCollection 2021.
7
Generating contextual embeddings for emergency department chief complaints.为急诊科主要症状生成上下文嵌入。
JAMIA Open. 2020 Jul 15;3(2):160-166. doi: 10.1093/jamiaopen/ooaa022. eCollection 2020 Jul.
8
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
9
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
10
Biomedical semantic indexing by deep neural network with multi-task learning.基于多任务学习的深度神经网络生物医学语义索引
BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):502. doi: 10.1186/s12859-018-2534-2.

引用本文的文献

1
Impact of word embedding models on text analytics in deep learning environment: a review.词嵌入模型对深度学习环境下文本分析的影响:综述
Artif Intell Rev. 2023 Feb 22:1-81. doi: 10.1007/s10462-023-10419-1.

本文引用的文献

1
BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.BERTMeSH:基于深度上下文表示学习的大规模高性能 MeSH 索引与全文检索
Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.
2
FullMeSH: improving large-scale MeSH indexing with full text.全文 MeSH:利用全文提高大规模 MeSH 标引的质量。
Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.
3
BioWordVec, improving biomedical word embeddings with subword information and MeSH.
BioWordVec,利用子词信息和 MeSH 改进生物医学词向量。
Sci Data. 2019 May 10;6(1):52. doi: 10.1038/s41597-019-0055-0.
4
MeSHProbeNet: a self-attentive probe net for MeSH indexing.MeSHProbeNet:一种用于 MeSH 索引的自注意探针网络。
Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.
5
Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.用于生物医学语义索引的搜索与图形数据库技术:实验分析
JMIR Med Inform. 2017 Dec 1;5(4):e48. doi: 10.2196/medinform.7059.
6
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.医学主题词表现状:通过学习排序实现PubMed规模的自动医学主题词表索引编制。
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.
7
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.深度医学主题词表:用于改进大规模医学主题词表索引的深度语义表示。
Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.
8
Optimal Thresholding of Classifiers to Maximize F1 Measure.分类器的最优阈值设定以最大化F1度量
Mach Learn Knowl Discov Databases. 2014;8725:225-239. doi: 10.1007/978-3-662-44851-9_15.