深度医学主题词表：用于改进大规模医学主题词表索引的深度语义表示。

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

作者信息

Peng Shengwen, You Ronghui, Wang Hongning, Zhai Chengxiang, Mamitsuka Hiroshi, Zhu Shanfeng

机构信息

School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.

Department of Computer Science, University of Virginia, Charlottesville 22904-4740, USA.

出版信息

Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.

DOI:10.1093/bioinformatics/btw294

PMID:27307646

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4908368/

Abstract

MOTIVATION

Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.

METHODS

We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation.

RESULTS

DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.

AVAILABILITY AND IMPLEMENTATION

The software is available upon request.

CONTACT

zhusf@fudan.edu.cn

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

医学主题词表（MeSH）索引，即将一组MeSH主标题分配给文献，对于生物医学文本挖掘和信息检索中的许多重要任务至关重要。大规模的MeSH索引有两个具有挑战性的方面：文献方面和MeSH方面。在文献方面，所有现有方法，包括美国国立医学图书馆的医学文本索引器（MTI）和最先进的方法MeSHLabeler，都是通过词袋法处理文本，无法很好地捕捉语义和上下文相关信息。

方法

我们提出了DeepMeSH，它将深度语义信息纳入大规模MeSH索引。它解决了文献和MeSH两方面的挑战。文献方面的挑战通过一种新的深度语义表示D2V-TFIDF来解决，该表示将稀疏和密集语义表示连接起来。MeSH方面的挑战通过使用MeSHLabeler的“学习排序”框架来解决，该框架整合了从新语义表示生成的各种类型的证据。

结果

对于包含6000篇文献的BioASQ3挑战数据，DeepMeSH的微观F值为0.6323，比MeSHLabeler的0.6218高2%，比MTI的0.5637高12%。

可用性和实现

可根据要求提供软件。

联系方式

zhusf@fudan.edu.cn

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/557c/4908368/8be05ea4b2d1/btw294f1p.jpg

相似文献

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.深度医学主题词表：用于改进大规模医学主题词表索引的深度语义表示。

Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.医学主题词标注器与深度医学主题词：大规模医学主题词标引的最新进展

Methods Mol Biol. 2018;1807:203-209. doi: 10.1007/978-1-4939-8561-6_15.

FullMeSH: improving large-scale MeSH indexing with full text.全文 MeSH：利用全文提高大规模 MeSH 标引的质量。

Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.医学主题词表（MeSH）标注器：通过整合多种证据提高大规模医学主题词表索引的准确性。

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

MeSHProbeNet: a self-attentive probe net for MeSH indexing.MeSHProbeNet：一种用于 MeSH 索引的自注意探针网络。

Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.BERTMeSH：基于深度上下文表示学习的大规模高性能 MeSH 索引与全文检索

Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Biomedical semantic indexing by deep neural network with multi-task learning.基于多任务学习的深度神经网络生物医学语义索引

BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):502. doi: 10.1186/s12859-018-2534-2.

Feature engineering for MEDLINE citation categorization with MeSH.使用医学主题词表（MeSH）进行医学文献数据库（MEDLINE）引文分类的特征工程

BMC Bioinformatics. 2015 Apr 8;16:113. doi: 10.1186/s12859-015-0539-7.

引用本文的文献

Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.利用全文特征增强生物医学文献中出版物类型和研究设计的自动索引。

medRxiv. 2025 Apr 28:2025.04.23.25326300. doi: 10.1101/2025.04.23.25326300.

Context-Aware Contrastive Representation Learning for Zero-Shot Biomedical Text Classification.用于零样本生物医学文本分类的上下文感知对比表示学习

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2024 Dec;2024:3611-3614. doi: 10.1109/bibm62325.2024.10822585.

Is metadata of articles about COVID-19 enough for multilabel topic classification task?关于 COVID-19 的文章的元数据是否足以完成多标签主题分类任务？

Database (Oxford). 2024 Oct 21;2024. doi: 10.1093/database/baae106.

Identification of Drug-Disease Associations Using a Random Walk with Restart Method and Supervised Learning.基于重启动随机游走与监督学习的药物-疾病关联识别方法。

Comput Math Methods Med. 2022 Oct 10;2022:7035634. doi: 10.1155/2022/7035634. eCollection 2022.

Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。

Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.

Multi-probe attention neural network for COVID-19 semantic indexing.多探针注意力神经网络用于 COVID-19 语义索引。

BMC Bioinformatics. 2022 Jun 29;23(1):259. doi: 10.1186/s12859-022-04803-x.

Thesaurus-based word embeddings for automated biomedical literature classification.基于词库的词嵌入用于自动化生物医学文献分类。

Neural Comput Appl. 2022;34(2):937-950. doi: 10.1007/s00521-021-06053-z. Epub 2021 May 11.

COS: A new MeSH term embedding incorporating corpus, ontology, and semantic predications.COS：一种新的包含语料库、本体和语义谓词的 MeSH 术语嵌入方法。

PLoS One. 2021 May 4;16(5):e0251094. doi: 10.1371/journal.pone.0251094. eCollection 2021.

Automatic MeSH Indexing: Revisiting the Subheading Attachment Problem.自动主题词标引：重新审视副主题词附着问题。

AMIA Annu Symp Proc. 2021 Jan 25;2020:1031-1040. eCollection 2020.

Tackling Research Inefficiency in Degenerative Cervical Myelopathy: Illustrative Review.应对退行性颈椎脊髓病研究效率低下问题：实例综述

JMIR Res Protoc. 2020 Jun 11;9(6):e15922. doi: 10.2196/15922.

本文引用的文献

Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.用于生物医学文本分类的卷积神经网络：在生物医学文章索引中的应用

ACM BCB. 2015 Sep;2015:258-267. doi: 10.1145/2808719.2808746.

Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints.基于 MeSH 语义和全局内容约束的高效半监督 MEDLINE 文档聚类。

IEEE Trans Cybern. 2013 Aug;43(4):1265-76. doi: 10.1109/TSMCB.2012.2227998.

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。

Nucleic Acids Res. 2015 Jan;43(Database issue):D6-17. doi: 10.1093/nar/gku1130. Epub 2014 Nov 14.

Comparison and combination of several MeSH indexing approaches.几种医学主题词（MeSH）标引方法的比较与组合

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.

Composition in distributional models of semantics.分布语义模型中的组合。

Cogn Sci. 2010 Nov;34(8):1388-429. doi: 10.1111/j.1551-6709.2010.01106.x.

Evaluation of Query Expansion Using MeSH in PubMed.在PubMed中使用医学主题词表（MeSH）进行查询扩展的评估

Inf Retr Boston. 2009;12(1):69-80. doi: 10.1007/s10791-008-9074-8.

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.通过整合 MeSH 语义相似度来增强 MEDLINE 文档聚类。

Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度医学主题词表：用于改进大规模医学主题词表索引的深度语义表示。

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

方法

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献