MeSH：全文检索的窗口，用于文档摘要。

MeSH: a window into full text for document summarization.

机构信息

Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA.

出版信息

Bioinformatics. 2011 Jul 1;27(13):i120-8. doi: 10.1093/bioinformatics/btr223.

DOI:10.1093/bioinformatics/btr223

PMID:21685060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3117369/

Abstract

MOTIVATION

Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents.

RESULTS

Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts.

CONTACT

sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu.

摘要

动机

以前的生物医学文本挖掘领域的研究一直局限于 MEDLINE 记录中的标题、摘要和元数据。最近的研究计划，如 TREC 基因组学和 BioCreAtIvE，强烈表明超越摘要进入全文领域的优点。然而，由于需要更多的资源，以及更高的准确性，处理全文更加昂贵。由于全文包含详细说明、上下文化、对比、补充等内容，因此错误率更高。鉴于此，我们探索了一种在摘要和全文之间取得平衡的方法。具体来说，我们创建仅包含重要部分的全文文档的简化版本。从长远来看，我们的目标是探索使用这些摘要来实现文档检索和信息提取等功能。在这里，我们专注于设计摘要策略。特别是，我们探索了使用由训练有素的注释者手动分配给文档的 MeSH 术语作为从全文文档中选择重要文本段的线索。

结果

我们的实验证实了我们的方法选择重要文本部分的能力。使用 ROUGE 度量进行评估，我们的 MeSH 术语方法分别实现了最大 ROUGE-1、ROUGE-2 和 ROUGE-SU4 F 分数 0.4150、0.1435 和 0.1782，而最大基线分数分别为 0.3815、0.1353 和 0.1428。使用 MeSH 配置文件策略，我们能够实现最大 ROUGE F 分数分别为 0.4320、0.1497 和 0.1887。对基线和我们提出的策略的人工评估进一步证实了我们的方法从全文中选择重要句子的能力。

联系方式

sanmitra-bhattacharya@uiowa.edu；padmini-srinivasan@uiowa.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ca12/3117369/53ff01610223/btr223f1.jpg

相似文献

MeSH: a window into full text for document summarization.MeSH：全文检索的窗口，用于文档摘要。

Bioinformatics. 2011 Jul 1;27(13):i120-8. doi: 10.1093/bioinformatics/btr223.

CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text.CERC：一个用于临床和生物医学文本的交互式内容提取、识别和构建工具。

BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 14):306. doi: 10.1186/s12911-020-01330-8.

Information content in Medline record fields.医学在线数据库（Medline）记录字段中的信息内容。

Int J Med Inform. 2004 Jun 30;73(6):515-27. doi: 10.1016/j.ijmedinf.2004.02.008.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Quantifying the informativeness for biomedical literature summarization: An itemset mining method.量化生物医学文献摘要的信息量：一种基于项集挖掘的方法。

Comput Methods Programs Biomed. 2017 Jul;146:77-89. doi: 10.1016/j.cmpb.2017.05.011. Epub 2017 May 27.

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.BERTMeSH：基于深度上下文表示学习的大规模高性能 MeSH 索引与全文检索

Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.

A document clustering and ranking system for exploring MEDLINE citations.一种用于探索MEDLINE引文的文档聚类和排序系统。

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):651-61. doi: 10.1197/jamia.M2215. Epub 2007 Jun 28.

MEDRank: using graph-based concept ranking to index biomedical texts.MEDRank：基于图的概念排序在生物医学文本索引中的应用。

Int J Med Inform. 2011 Jun;80(6):431-41. doi: 10.1016/j.ijmedinf.2011.02.008. Epub 2011 Mar 25.

Disambiguation of biomedical text using diverse sources of information.利用多种信息来源对生物医学文本进行消歧。

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S7. doi: 10.1186/1471-2105-9-S11-S7.

Analyzing the Information Content of Text-Based Files in Supplementary Materials of Biomedical Literature.分析生物医学文献补充材料中基于文本文件的信息含量。

Stud Health Technol Inform. 2022 May 25;294:876-877. doi: 10.3233/SHTI220614.

引用本文的文献

Online Databases in Circular RNAs.环状RNA中的在线数据库

Adv Exp Med Biol. 2025;1485:43-57. doi: 10.1007/978-981-96-9428-0_4.

A message passing framework with multiple data integration for miRNA-disease association prediction.一种具有多种数据集成的消息传递框架，用于 miRNA-疾病关联预测。

Sci Rep. 2022 Sep 28;12(1):16259. doi: 10.1038/s41598-022-20529-5.

Web tools to perform long non-coding RNAs analysis in oncology research.网络工具在肿瘤研究中进行长非编码 RNA 分析。

Database (Oxford). 2021 Jul 23;2021. doi: 10.1093/database/baab047.

COS: A new MeSH term embedding incorporating corpus, ontology, and semantic predications.COS：一种新的包含语料库、本体和语义谓词的 MeSH 术语嵌入方法。

PLoS One. 2021 May 4;16(5):e0251094. doi: 10.1371/journal.pone.0251094. eCollection 2021.

BMC Med Inform Decis Mak. 2020 Dec 15;20(Suppl 14):306. doi: 10.1186/s12911-020-01330-8.

Nc2Eye: A Curated ncRNAomics Knowledgebase for Bridging Basic and Clinical Research in Eye Diseases.Nc2Eye：一个用于衔接眼部疾病基础研究与临床研究的ncRNA组学知识数据库。

Front Cell Dev Biol. 2020 Feb 14;8:75. doi: 10.3389/fcell.2020.00075. eCollection 2020.

ENdb: a manually curated database of experimentally supported enhancers for human and mouse.ENdb：一个经过人工策展的人类和小鼠实验支持增强子数据库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D51-D57. doi: 10.1093/nar/gkz973.

LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases.LncRNADisease 2.0：一个更新的长非编码 RNA 相关疾病数据库。

Nucleic Acids Res. 2019 Jan 8;47(D1):D1034-D1037. doi: 10.1093/nar/gky905.

MNDR v2.0: an updated resource of ncRNA-disease associations in mammals.MNDR v2.0：哺乳动物中更新的 ncRNA-疾病关联资源。

Nucleic Acids Res. 2018 Jan 4;46(D1):D371-D374. doi: 10.1093/nar/gkx1025.

Social media engagement analysis of U.S. Federal health agencies on Facebook.美国联邦卫生机构在脸书上的社交媒体参与度分析

BMC Med Inform Decis Mak. 2017 Apr 21;17(1):49. doi: 10.1186/s12911-017-0447-z.

本文引用的文献

The structural and content aspects of abstracts versus bodies of full text journal articles are different.文摘的结构和内容方面与全文期刊文章的不同。

BMC Bioinformatics. 2010 Sep 29;11:492. doi: 10.1186/1471-2105-11-492.

FigSum: automatically generating structured text summaries for figures in biomedical literature.FigSum：自动为生物医学文献中的图表生成结构化文本摘要。

AMIA Annu Symp Proc. 2009 Nov 14;2009:6-10.

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.通过整合 MeSH 语义相似度来增强 MEDLINE 文档聚类。

Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3.

MeSH Up: effective MeSH text classification for improved document retrieval.医学主题词表升级：用于改进文档检索的有效医学主题词表文本分类。

Bioinformatics. 2009 Jun 1;25(11):1412-8. doi: 10.1093/bioinformatics/btp249. Epub 2009 Apr 17.

Is searching full text more effective than searching abstracts?搜索全文比搜索摘要更有效吗？

BMC Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.

Biomedical text summarisation using concept chains.使用概念链的生物医学文本摘要

Int J Data Min Bioinform. 2007;1(4):389-407. doi: 10.1504/ijdmb.2007.012967.

A coherent graph-based semantic clustering and summarization approach for biomedical literature and a new summarization evaluation method.一种用于生物医学文献的基于连贯图的语义聚类与摘要方法及一种新的摘要评估方法。

BMC Bioinformatics. 2007 Nov 27;8 Suppl 9(Suppl 9):S4. doi: 10.1186/1471-2105-8-S9-S4.

Five-way smoking status classification using text hot-spot identification and error-correcting output codes.使用文本热点识别和纠错输出码的五分类吸烟状态分类法

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18.

GeneLibrarian: an effective gene-information summarization and visualization system.基因图书馆员：一个有效的基因信息汇总与可视化系统。

BMC Bioinformatics. 2006 Aug 29;7:392. doi: 10.1186/1471-2105-7-392.

Retrieval with gene queries.使用基因查询进行检索。

BMC Bioinformatics. 2006 Apr 21;7:220. doi: 10.1186/1471-2105-7-220.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MeSH：全文检索的窗口，用于文档摘要。

MeSH: a window into full text for document summarization.

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

动机

结果

联系方式

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献