基于自动生成的摘要进行 MeSH 标引。

MeSH indexing based on automatically generated summaries.

机构信息

National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

DOI:10.1186/1471-2105-14-208

PMID:23802936

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3706357/

Abstract

BACKGROUND

MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results.

RESULTS

We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision.

CONCLUSIONS

Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.

摘要

背景

美国国家医学图书馆（NLM）的 MEDLINE 引文是使用受控词汇 Medical Subject Headings（MeSH）进行人工索引的。为此，索引员需要阅读文章的全文。由于 MEDLINE 的增长，NLM 索引倡议正在探索可以支持索引员任务的索引方法。Medical Text Indexer（MTI）是 NLM 索引倡议开发的一种工具，用于向索引员提供 MeSH 索引建议。目前，MTI 的输入仅为 MEDLINE 引文、标题和摘要。以前的工作表明，使用全文作为 MTI 的输入可以提高召回率，但会大大降低精度。我们建议使用从全文自动生成的摘要作为 MTI 的输入，以用于向索引员建议 MeSH 标题的任务。摘要从全文中提取最显著的信息，这可能会增加基于 MEDLINE 的自动索引方法的覆盖范围。我们假设如果结果足够好，手动索引员可能可以使用自动摘要代替全文，并结合 MTI 的建议，以在保持高质量索引结果的同时加快速度。

结果

我们使用两种不同的摘要生成器生成了不同长度的摘要，并使用不同的算法（MTI、单个 MTI 组件和机器学习）对摘要进行了 MTI 索引评估。将结果与全文文章和 MEDLINE 引文进行了比较。我们的结果表明，自动生成的摘要在召回率上与全文文章相似，但在精度上更高。与 MEDLINE 引文相比，摘要的召回率更高，但精度更低。

结论

我们的结果表明，自动摘要的索引效果优于全文文章。摘要的召回率与全文文章相似，但精度更高，这似乎表明自动摘要可以有效地捕捉原始文章中的最重要内容。将 MEDLINE 引文和自动生成的摘要相结合，可以改进 MTI 提出的建议。另一方面，索引性能可能取决于要索引的 MeSH 标题。因此，可以将摘要技术视为一种特征选择算法，可能需要针对每个 MeSH 标题单独进行调整。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71b1/3706357/eee2b1fc84ec/1471-2105-14-208-1.jpg

相似文献

MeSH indexing based on automatically generated summaries.

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

MEDRank: using graph-based concept ranking to index biomedical texts.

Int J Med Inform. 2011 Jun;80(6):431-41. doi: 10.1016/j.ijmedinf.2011.02.008. Epub 2011 Mar 25.

12 years on - Is the NLM medical text indexer still useful and relevant?

J Biomed Semantics. 2017 Feb 23;8(1):8. doi: 10.1186/s13326-017-0113-5.

A recent advance in the automatic indexing of the biomedical literature.

J Biomed Inform. 2009 Oct;42(5):814-23. doi: 10.1016/j.jbi.2008.12.007. Epub 2008 Dec 30.

Automated indexing using NLM's Medical Text Indexer (MTI) compared to human indexing in Medline: a pilot study.

J Med Libr Assoc. 2023 Jul 10;111(3):684-694. doi: 10.5195/jmla.2023.1588.

Automatic inference of indexing rules for MEDLINE.

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S11. doi: 10.1186/1471-2105-9-S11-S11.

Manual versus machine: How accurately does the Medical Text Indexer (MTI) classify different document types into disease areas?

PLoS One. 2024 Mar 13;19(3):e0297526. doi: 10.1371/journal.pone.0297526. eCollection 2024.

Comparison and combination of several MeSH indexing approaches.

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

Automatic MeSH Indexing: Revisiting the Subheading Attachment Problem.

AMIA Annu Symp Proc. 2021 Jan 25;2020:1031-1040. eCollection 2020.

Fine-grained indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE indexing tool.

AMIA Annu Symp Proc. 2007 Oct 11;2007:553-7.

引用本文的文献

Use of 'Pharmaceutical services' Medical Subject Headings (MeSH) in articles assessing pharmacists' interventions.

Explor Res Clin Soc Pharm. 2022 Aug 20;7:100172. doi: 10.1016/j.rcsop.2022.100172. eCollection 2022 Sep.

CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text.

BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 14):306. doi: 10.1186/s12911-020-01330-8.

FullMeSH: improving large-scale MeSH indexing with full text.

Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

Perspective: An Extension of the STROBE Statement for Observational Studies in Nutritional Epidemiology (STROBE-nut): Explanation and Elaboration.

Adv Nutr. 2017 Sep 15;8(5):652-678. doi: 10.3945/an.117.015941. Print 2017 Sep.

Extracting Characteristics of the Study Subjects from Full-Text Articles.

AMIA Annu Symp Proc. 2015 Nov 5;2015:484-91. eCollection 2015.

Boosting for high-dimensional two-class prediction.

BMC Bioinformatics. 2015 Sep 21;16:300. doi: 10.1186/s12859-015-0723-9.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

Stochastic Gradient Descent and the Prediction of MeSH for PubMed Records.

AMIA Annu Symp Proc. 2014 Nov 14;2014:1198-207. eCollection 2014.

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.

BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

Feasibility and implementation of a literature information management system for human papillomavirus in head and neck cancers with imaging.

Cancer Inform. 2014 Oct 13;13(Suppl 1):49-57. doi: 10.4137/CIN.S13884. eCollection 2014.

本文引用的文献

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts.

BMC Bioinformatics. 2011 Aug 26;12:355. doi: 10.1186/1471-2105-12-355.

A semantic graph-based approach to biomedical summarisation.

Artif Intell Med. 2011 Sep;53(1):1-14. doi: 10.1016/j.artmed.2011.06.005. Epub 2011 Jul 12.

Knowledge-based biomedical word sense disambiguation: comparison of approaches.

BMC Bioinformatics. 2010 Nov 22;11:569. doi: 10.1186/1471-2105-11-569.

An overview of MetaMap: historical perspective and recent advances.

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Chi-square-based scoring function for categorization of MEDLINE citations.

Methods Inf Med. 2010;49(4):371-8. doi: 10.3414/ME09-01-0009. Epub 2010 Jan 20.

MeSH Up: effective MeSH text classification for improved document retrieval.

Bioinformatics. 2009 Jun 1;25(11):1412-8. doi: 10.1093/bioinformatics/btp249. Epub 2009 Apr 17.

Fine-grained indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE indexing tool.

AMIA Annu Symp Proc. 2007 Oct 11;2007:553-7.

MScanner: a classifier for retrieving Medline citations.

BMC Bioinformatics. 2008 Feb 19;9:108. doi: 10.1186/1471-2105-9-108.

A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.

BMC Bioinformatics. 2007 Nov 27;8 Suppl 9(Suppl 9):S4. doi: 10.1186/1471-2105-8-S9-S4.

BMC Bioinformatics. 2007 Oct 30;8:423. doi: 10.1186/1471-2105-8-423.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于自动生成的摘要进行 MeSH 标引。

MeSH indexing based on automatically generated summaries.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献