全文 MeSH：利用全文提高大规模 MeSH 标引的质量。

FullMeSH: improving large-scale MeSH indexing with full text.

机构信息

School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.

出版信息

Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

DOI:10.1093/bioinformatics/btz756

PMID:31596475

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7523651/

Abstract

MOTIVATION

With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles.

RESULTS

We propose FullMeSH, a large-scale MeSH indexing method taking advantage of the recent increase in the availability of full text articles. Compared to DeepMeSH and other state-of-the-art methods, FullMeSH has three novelties: (i) Instead of using a full text as a whole, FullMeSH segments it into several sections with their normalized titles in order to distinguish their contributions to the overall performance. (ii) FullMeSH integrates the evidence from different sections in a 'learning to rank' framework by combining the sparse and deep semantic representations. (iii) FullMeSH trains an Attention-based Convolutional Neural Network for each section, which achieves better performance on infrequent MeSH headings. FullMeSH has been developed and empirically trained on the entire set of 1.4 million full-text articles in the PubMed Central Open Access subset. It achieved a Micro F-measure of 66.76% on a test set of 10 000 articles, which was 3.3% and 6.4% higher than DeepMeSH and MeSHLabeler, respectively. Furthermore, FullMeSH demonstrated an average improvement of 4.7% over DeepMeSH for indexing Check Tags, a set of most frequently indexed MeSH headings.

AVAILABILITY AND IMPLEMENTATION

The software is available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着生物医学文献的快速增长，通过医学主题词（MeSH）自动对生物医学文章进行索引，即 MeSH 索引，对于促进假设生成和知识发现变得越来越重要。在过去的几年中，已经提出了许多大规模的 MeSH 索引方法，例如 Medical Text Indexer、MeSHLabeler、DeepMeSH 和 MeSHProbeNet。然而，这些方法的性能受到可用信息的限制，即仅使用生物医学文章的标题和摘要。

结果

我们提出了 FullMeSH，这是一种利用全文文章可用性增加的大规模 MeSH 索引方法。与 DeepMeSH 和其他最先进的方法相比，FullMeSH 有三个新颖之处：（i）它不是使用整篇文章，而是将其分割成几个部分，并对其进行规范化标题，以区分它们对整体性能的贡献。（ii）FullMeSH 通过结合稀疏和深度语义表示，在“学习排序”框架中整合来自不同部分的证据。（iii）FullMeSH 为每个部分训练基于注意力的卷积神经网络，这在不常见的 MeSH 标题上实现了更好的性能。FullMeSH 已在 PubMed Central Open Access 子集的 140 万篇全文文章的整个集合上进行开发和实证训练。它在 10000 篇文章的测试集上实现了 66.76%的微 F-measure，分别比 DeepMeSH 和 MeSHLabeler 高 3.3%和 6.4%。此外，对于索引 Check Tags（一组最常索引的 MeSH 标题），FullMeSH 相对于 DeepMeSH 平均提高了 4.7%。

可用性和实现

软件可根据要求提供。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

FullMeSH: improving large-scale MeSH indexing with full text.全文 MeSH：利用全文提高大规模 MeSH 标引的质量。

Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.BERTMeSH：基于深度上下文表示学习的大规模高性能 MeSH 索引与全文检索

Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.深度医学主题词表：用于改进大规模医学主题词表索引的深度语义表示。

Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.医学主题词标注器与深度医学主题词：大规模医学主题词标引的最新进展

Methods Mol Biol. 2018;1807:203-209. doi: 10.1007/978-1-4939-8561-6_15.

MeSHProbeNet: a self-attentive probe net for MeSH indexing.MeSHProbeNet：一种用于 MeSH 索引的自注意探针网络。

Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.医学主题词表（MeSH）标注器：通过整合多种证据提高大规模医学主题词表索引的准确性。

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

Influence of automated indexing in Medical Subject Headings (MeSH) selection for pharmacy practice journals.自动化索引对药学实践期刊的医学主题词（MeSH）选择的影响。

Res Social Adm Pharm. 2024 Sep;20(9):911-917. doi: 10.1016/j.sapharm.2024.06.003. Epub 2024 Jun 12.

MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.医学主题词表现状：通过学习排序实现PubMed规模的自动医学主题词表索引编制。

J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.

引用本文的文献

Enhancing automated indexing of publication types and study designs in biomedical literature using full-text features.利用全文特征增强生物医学文献中出版物类型和研究设计的自动索引。

medRxiv. 2025 Apr 28:2025.04.23.25326300. doi: 10.1101/2025.04.23.25326300.

Algorithmic indexing in MEDLINE frequently overlooks important concepts and may compromise literature search results.MEDLINE中的算法索引经常会忽略重要概念，可能会影响文献检索结果。

J Med Libr Assoc. 2025 Jan 14;113(1):39-48. doi: 10.5195/jmla.2025.1936.

Scientometric analysis of trends in global research on acne treatment.痤疮治疗全球研究趋势的科学计量分析。

Int J Womens Dermatol. 2023 Jul 28;9(3):e082. doi: 10.1097/JW9.0000000000000082. eCollection 2023 Oct.

LitCovid ensemble learning for COVID-19 multi-label classification.LitCovid 用于 COVID-19 多标签分类的集成学习。

Database (Oxford). 2022 Nov 25;2022. doi: 10.1093/database/baac103.

Use of 'Pharmaceutical services' Medical Subject Headings (MeSH) in articles assessing pharmacists' interventions.在评估药剂师干预措施的文章中使用“药学服务”医学主题词（MeSH）。

Explor Res Clin Soc Pharm. 2022 Aug 20;7:100172. doi: 10.1016/j.rcsop.2022.100172. eCollection 2022 Sep.

Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。

Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.

Multi-probe attention neural network for COVID-19 semantic indexing.多探针注意力神经网络用于 COVID-19 语义索引。

BMC Bioinformatics. 2022 Jun 29;23(1):259. doi: 10.1186/s12859-022-04803-x.

A multiyear systematic survey of the quality of reporting for randomised trials in dentistry, neurology and geriatrics published in journals of Spain and Latin America.一项针对在西班牙和拉丁美洲期刊上发表的牙科学、神经病学和老年病学随机试验报告质量的多年系统调查。

BMC Med Res Methodol. 2021 Jul 26;21(1):153. doi: 10.1186/s12874-021-01337-3.

Thesaurus-based word embeddings for automated biomedical literature classification.基于词库的词嵌入用于自动化生物医学文献分类。

Neural Comput Appl. 2022;34(2):937-950. doi: 10.1007/s00521-021-06053-z. Epub 2021 May 11.

Recent advances in biomedical literature mining.生物医学文献挖掘的最新进展。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.

本文引用的文献

MeSHProbeNet: a self-attentive probe net for MeSH indexing.MeSHProbeNet：一种用于 MeSH 索引的自注意探针网络。

Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.

PMC text mining subset in BioC: about three million full-text articles and growing.PMC 文本挖掘子集在 BioC 中：约三百万篇全文文章且还在不断增加。

Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.

Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。

Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28. doi: 10.1093/nar/gky1069.

MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.医学主题词表现状：通过学习排序实现PubMed规模的自动医学主题词表索引编制。

J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.

12 years on - Is the NLM medical text indexer still useful and relevant?十二年过去了——国立医学图书馆医学文本索引工具仍然有用吗？它还适用吗？

J Biomed Semantics. 2017 Feb 23;8(1):8. doi: 10.1186/s13326-017-0113-5.

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.深度医学主题词表：用于改进大规模医学主题词表索引的深度语义表示。

Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.

Extracting Characteristics of the Study Subjects from Full-Text Articles.从全文文章中提取研究对象的特征。

AMIA Annu Symp Proc. 2015 Nov 5;2015:484-91. eCollection 2015.

Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints.基于 MeSH 语义和全局内容约束的高效半监督 MEDLINE 文档聚类。

IEEE Trans Cybern. 2013 Aug;43(4):1265-76. doi: 10.1109/TSMCB.2012.2227998.

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验