Suppr超能文献

全文 MeSH:利用全文提高大规模 MeSH 标引的质量。

FullMeSH: improving large-scale MeSH indexing with full text.

机构信息

School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.

National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.

出版信息

Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

Abstract

MOTIVATION

With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles.

RESULTS

We propose FullMeSH, a large-scale MeSH indexing method taking advantage of the recent increase in the availability of full text articles. Compared to DeepMeSH and other state-of-the-art methods, FullMeSH has three novelties: (i) Instead of using a full text as a whole, FullMeSH segments it into several sections with their normalized titles in order to distinguish their contributions to the overall performance. (ii) FullMeSH integrates the evidence from different sections in a 'learning to rank' framework by combining the sparse and deep semantic representations. (iii) FullMeSH trains an Attention-based Convolutional Neural Network for each section, which achieves better performance on infrequent MeSH headings. FullMeSH has been developed and empirically trained on the entire set of 1.4 million full-text articles in the PubMed Central Open Access subset. It achieved a Micro F-measure of 66.76% on a test set of 10 000 articles, which was 3.3% and 6.4% higher than DeepMeSH and MeSHLabeler, respectively. Furthermore, FullMeSH demonstrated an average improvement of 4.7% over DeepMeSH for indexing Check Tags, a set of most frequently indexed MeSH headings.

AVAILABILITY AND IMPLEMENTATION

The software is available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

随着生物医学文献的快速增长,通过医学主题词(MeSH)自动对生物医学文章进行索引,即 MeSH 索引,对于促进假设生成和知识发现变得越来越重要。在过去的几年中,已经提出了许多大规模的 MeSH 索引方法,例如 Medical Text Indexer、MeSHLabeler、DeepMeSH 和 MeSHProbeNet。然而,这些方法的性能受到可用信息的限制,即仅使用生物医学文章的标题和摘要。

结果

我们提出了 FullMeSH,这是一种利用全文文章可用性增加的大规模 MeSH 索引方法。与 DeepMeSH 和其他最先进的方法相比,FullMeSH 有三个新颖之处:(i)它不是使用整篇文章,而是将其分割成几个部分,并对其进行规范化标题,以区分它们对整体性能的贡献。(ii)FullMeSH 通过结合稀疏和深度语义表示,在“学习排序”框架中整合来自不同部分的证据。(iii)FullMeSH 为每个部分训练基于注意力的卷积神经网络,这在不常见的 MeSH 标题上实现了更好的性能。FullMeSH 已在 PubMed Central Open Access 子集的 140 万篇全文文章的整个集合上进行开发和实证训练。它在 10000 篇文章的测试集上实现了 66.76%的微 F-measure,分别比 DeepMeSH 和 MeSHLabeler 高 3.3%和 6.4%。此外,对于索引 Check Tags(一组最常索引的 MeSH 标题),FullMeSH 相对于 DeepMeSH 平均提高了 4.7%。

可用性和实现

软件可根据要求提供。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
FullMeSH: improving large-scale MeSH indexing with full text.
Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.
2
BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.
Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.
3
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.
Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.
4
MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.
Methods Mol Biol. 2018;1807:203-209. doi: 10.1007/978-1-4939-8561-6_15.
5
MeSHProbeNet: a self-attentive probe net for MeSH indexing.
Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.
6
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.
Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.
7
MeSH indexing based on automatically generated summaries.
BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.
8
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.
9
Influence of automated indexing in Medical Subject Headings (MeSH) selection for pharmacy practice journals.
Res Social Adm Pharm. 2024 Sep;20(9):911-917. doi: 10.1016/j.sapharm.2024.06.003. Epub 2024 Jun 12.
10
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.

引用本文的文献

3
Scientometric analysis of trends in global research on acne treatment.
Int J Womens Dermatol. 2023 Jul 28;9(3):e082. doi: 10.1097/JW9.0000000000000082. eCollection 2023 Oct.
4
LitCovid ensemble learning for COVID-19 multi-label classification.
Database (Oxford). 2022 Nov 25;2022. doi: 10.1093/database/baac103.
5
Use of 'Pharmaceutical services' Medical Subject Headings (MeSH) in articles assessing pharmacists' interventions.
Explor Res Clin Soc Pharm. 2022 Aug 20;7:100172. doi: 10.1016/j.rcsop.2022.100172. eCollection 2022 Sep.
7
Multi-probe attention neural network for COVID-19 semantic indexing.
BMC Bioinformatics. 2022 Jun 29;23(1):259. doi: 10.1186/s12859-022-04803-x.
9
Thesaurus-based word embeddings for automated biomedical literature classification.
Neural Comput Appl. 2022;34(2):937-950. doi: 10.1007/s00521-021-06053-z. Epub 2021 May 11.
10
Recent advances in biomedical literature mining.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.

本文引用的文献

1
MeSHProbeNet: a self-attentive probe net for MeSH indexing.
Bioinformatics. 2019 Oct 1;35(19):3794-3802. doi: 10.1093/bioinformatics/btz142.
2
PMC text mining subset in BioC: about three million full-text articles and growing.
Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.
3
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res. 2019 Jan 8;47(D1):D23-D28. doi: 10.1093/nar/gky1069.
4
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.
5
12 years on - Is the NLM medical text indexer still useful and relevant?
J Biomed Semantics. 2017 Feb 23;8(1):8. doi: 10.1186/s13326-017-0113-5.
6
DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.
Bioinformatics. 2016 Jun 15;32(12):i70-i79. doi: 10.1093/bioinformatics/btw294.
7
Extracting Characteristics of the Study Subjects from Full-Text Articles.
AMIA Annu Symp Proc. 2015 Nov 5;2015:484-91. eCollection 2015.
8
Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints.
IEEE Trans Cybern. 2013 Aug;43(4):1265-76. doi: 10.1109/TSMCB.2012.2227998.
9
MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.
Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.
10
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验