Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
Department of Computer Engineering, Shahreza Campus, University of Isfahan, Iran.
J Biomed Inform. 2021 Apr;116:103706. doi: 10.1016/j.jbi.2021.103706. Epub 2021 Feb 18.
Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time. The features we used in this paper are word similarity, semantic similarity, and co-reference similarity, which are modelled as three different layers. The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts. The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER. Extensive evaluation by ROUGE and BERTScore shows increased F-measure values.
自动文本摘要方法生成输入文本的更简短版本,以帮助读者快速获得信息要点。现有的文本摘要方法在选择句子时通常侧重于文本的单一方面,导致可能丢失重要信息。在这项研究中,我们提出了一种特定于领域的方法,即将文档建模为多层图,以便同时处理文本的多个特征。我们在本文中使用的特征是词相似度、语义相似度和共指相似度,它们被建模为三个不同的层。无监督方法基于 MultiRank 算法和概念数量从多层图中选择句子。所提出的 MultiGBS 算法使用 UMLS 并使用 SemRep、MetaMap 和 OGER 等不同工具提取概念和关系。通过 ROUGE 和 BERTScore 的广泛评估,F-measure 值有所提高。