Suppr超能文献

概率生物医学文本摘要中重要概念识别的不同方法。

Different approaches for identifying important concepts in probabilistic biomedical text summarization.

机构信息

Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.

出版信息

Artif Intell Med. 2018 Jan;84:101-116. doi: 10.1016/j.artmed.2017.11.004. Epub 2017 Dec 6.

Abstract

Automatic text summarization tools help users in the biomedical domain to acquire their intended information from various textual resources more efficiently. Some of biomedical text summarization systems put the basis of their sentence selection approach on the frequency of concepts extracted from the input text. However, it seems that exploring other measures rather than the raw frequency for identifying valuable contents within an input document, or considering correlations existing between concepts, may be more useful for this type of summarization. In this paper, we describe a Bayesian summarization method for biomedical text documents. The Bayesian summarizer initially maps the input text to the Unified Medical Language System (UMLS) concepts; then it selects the important ones to be used as classification features. We introduce six different feature selection approaches to identify the most important concepts of the text and select the most informative contents according to the distribution of these concepts. We show that with the use of an appropriate feature selection approach, the Bayesian summarizer can improve the performance of biomedical summarization. Using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) toolkit, we perform extensive evaluations on a corpus of scientific papers in the biomedical domain. The results show that when the Bayesian summarizer utilizes the feature selection methods that do not use the raw frequency, it can outperform the biomedical summarizers that rely on the frequency of concepts, domain-independent and baseline methods.

摘要

自动文本摘要工具帮助生物医学领域的用户更有效地从各种文本资源中获取所需的信息。一些生物医学文本摘要系统将其句子选择方法的基础建立在从输入文本中提取的概念的频率上。然而,对于这种类型的摘要,探索其他措施而不是原始频率来识别输入文档中的有价值内容,或者考虑概念之间存在的相关性,可能会更有用。在本文中,我们描述了一种用于生物医学文本文档的贝叶斯摘要方法。贝叶斯摘要器首先将输入文本映射到统一医学语言系统 (UMLS) 概念;然后选择重要的概念作为分类特征。我们介绍了六种不同的特征选择方法来识别文本中最重要的概念,并根据这些概念的分布选择最具信息量的内容。我们表明,通过使用适当的特征选择方法,贝叶斯摘要器可以提高生物医学摘要的性能。我们使用面向摘要评估的召回导向工具包 (ROUGE) 在生物医学领域的科学论文语料库上进行了广泛的评估。结果表明,当贝叶斯摘要器使用不使用原始频率的特征选择方法时,它可以胜过依赖概念频率的生物医学摘要器、独立于领域的方法和基线方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验