Akhmetov Iskander, Mussabayev Rustam, Gelbukh Alexander
Kazakh-British Technical University, Almaty, Almaty, Kazakhstan.
Institute of Information and Computational Technologies, Almaty, Almaty, Kazakhstan.
PeerJ Comput Sci. 2022 Sep 26;8:e1103. doi: 10.7717/peerj-cs.1103. eCollection 2022.
The extractive text summarization (ETS) method for finding the salient information from a text automatically uses the exact sentences from the source text. In this article, we answer the question of what quality of a summary we can achieve with ETS methods? To maximize the ROUGE-1 score, we used five approaches: (1) adapted reduced variable neighborhood search (RVNS), (2) Greedy algorithm, (3) VNS initialized by Greedy algorithm results, (4) genetic algorithm, and (5) genetic algorithm initialized by the Greedy algorithm results. Furthermore, we ran experiments on articles from the arXive dataset. As a result, we found 0.59 and 0.25 scores for ROUGE-1 and ROUGE-2, respectively achievable by the approach, where the genetic algorithm initialized by the Greedy algorithm results, which happens to yield the best results out of the tested approaches. Moreover, those scores appear to be higher than scores obtained by the current state-of-the-art text summarization models: the best score in the literature for ROUGE-1 on the same data set is 0.46. Therefore, we have room for the development of ETS methods, which are now undeservedly forgotten.
用于从文本中自动提取显著信息的抽取式文本摘要(ETS)方法会直接使用源文本中的精确句子。在本文中,我们回答了一个问题:使用ETS方法能获得何种质量的摘要?为了使ROUGE - 1得分最大化,我们采用了五种方法:(1)自适应简化可变邻域搜索(RVNS),(2)贪心算法,(3)由贪心算法结果初始化的VNS,(4)遗传算法,以及(5)由贪心算法结果初始化的遗传算法。此外,我们在arXive数据集中的文章上进行了实验。结果发现,由贪心算法结果初始化的遗传算法这种方法分别能实现0.59和0.25的ROUGE - 1和ROUGE - 2得分,该方法在所有测试方法中恰好产生了最佳结果。而且,这些得分似乎高于当前最先进的文本摘要模型所获得的得分:在同一数据集上,文献中ROUGE - 1的最佳得分是0.46。因此,我们有发展ETS方法的空间,而目前这些方法被无端地遗忘了。