Suppr超能文献

医学中图表的力量:引入BioGraphSum进行有效的文本摘要。

The power of graphs in medicine: Introducing BioGraphSum for effective text summarization.

作者信息

Hark Cengiz

机构信息

İnönü University, Department of Computer Engineering, 44000, Malatya, Turkey.

出版信息

Heliyon. 2024 May 27;10(11):e31813. doi: 10.1016/j.heliyon.2024.e31813. eCollection 2024 Jun 15.

Abstract

In biomedicine, the expansive scientific literature combined with the frequent use of abbreviations, acronyms, and symbols presents considerable challenges for text processing and summarization. The Unified Medical Language System (UMLS) has been a go-to for extracting concepts and determining correlations in these studies; hence, the BioGraphSum model introduced in this study aims to reduce this UMLS dependence. Through adoption of an innovative perspective, sentences within a piece of text are graphically conceptualized as nodes, enabling the concept of "Malatya centrality" to be leveraged. This approach focuses on pinpointing influential nodes on a graph and, by analogy, the most pertinent sentences within the text for summarization. In order to evaluate the performance of the BioGraphSum approach, a corpus was curated that consisted of 450 contemporary scientific research articles available on the PubMed database, aligned with proven research methodology. The BioGraphSum model was subjected to rigorous testing against this corpus in order to demonstrate its capabilities. Preliminary results, especially in the precision-based and f-score-based ROUGE-(1-2), ROUGE-L, and ROUGE-SU metrics reported significant improvements when compared to other existing models considered state-of-the-art in text summarization.

摘要

在生物医学领域,大量的科学文献加上缩写、首字母缩略词和符号的频繁使用,给文本处理和摘要生成带来了巨大挑战。统一医学语言系统(UMLS)一直是这些研究中提取概念和确定相关性的首选工具;因此,本研究中引入的BioGraphSum模型旨在减少对UMLS的依赖。通过采用创新的视角,一段文本中的句子被图形化地概念化为节点,从而能够利用“马拉蒂亚中心性”的概念。这种方法专注于在图上找出有影响力的节点,类推而言,就是找出文本中最相关的句子进行摘要生成。为了评估BioGraphSum方法的性能,精心策划了一个语料库,该语料库由PubMed数据库中450篇当代科学研究文章组成,并符合经过验证的研究方法。针对这个语料库对BioGraphSum模型进行了严格测试,以展示其能力。初步结果显示,特别是在基于精确率和F值的ROUGE-(1-2)、ROUGE-L和ROUGE-SU指标方面,与文本摘要生成领域中被视为最先进的其他现有模型相比有显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e325/11154598/68a071dcaa31/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验