Debnath Dipanwita, Das Ranjita, Pakray Partha
Mizoram, 796012 India National Institute of Technology Mizoram.
Assam, 788010 India National Institute of Technology Silchar.
Appl Intell (Dordr). 2023;53(10):12268-12287. doi: 10.1007/s10489-022-04149-0. Epub 2022 Sep 24.
The availability of a tremendous amount of online information bringing about a broad interest in extracting relevant information in a compact and meaningful way, prompted the need for automatic text summarization. Hence, in the proposed system, the automated text summarization has been considered as an extractive single-document summarization problem, and a Cat Swarm Optimization (CSO) algorithm-based approach is proposed to solve it, whose objective is to generate good summaries in terms of content coverage, informative, anti-redundancy, and readability. In this work, input documents are pre-processed first. Then the cat population is initialized, where each individual (cat) in a binary vector is randomly initialized in the search space, considering the constraint. The objective function is then formulated considering different sentence quality measures. The Best Cat Memory Pool (BCMP) is initialized based on the objective function score. After that, individuals are randomly distributed for position updating to perform seeking/tracing mode operations based on the mixture ratio in each iteration. BCMP is also updated accordingly. Finally, an optimal individual is chosen to generate the summary after the last iteration. DUC-2001 and DUC-2002 data sets and ROUGE measures are used for system evaluation, and the obtained results are compared with the various state-of-the-art methods. We have achieved approximately 25% and 5% improvement on ROUGE-1 and ROUGE-2 scores on the datasets over the best existing method mentioned in this paper, revealing the proposed method's superiority. The proposed system is also evaluated considering the generational distance, CPU processing time, cohesion, and readability factor, reflecting that the system-generated summaries are readable, concise, relevant, and fast. We have also conducted a two-sample t-test, and one-way ANOVA test showing the proposed approach is statistically significant.
大量在线信息的可获取性引发了人们对以紧凑且有意义的方式提取相关信息的广泛兴趣,这促使了自动文本摘要技术的需求。因此,在该系统中,自动文本摘要被视为一个抽取式单文档摘要问题,并提出了一种基于猫群优化(CSO)算法的方法来解决它,其目标是在内容覆盖、信息性、抗冗余性和可读性方面生成高质量的摘要。在这项工作中,首先对输入文档进行预处理。然后初始化猫群,其中二进制向量中的每个个体(猫)在考虑约束的情况下在搜索空间中随机初始化。接着根据不同的句子质量度量来制定目标函数。基于目标函数得分初始化最佳猫记忆池(BCMP)。之后,个体根据每次迭代中的混合比例随机分布以进行位置更新,从而执行搜索/追踪模式操作。BCMP也相应地更新。最后,在最后一次迭代后选择最优个体来生成摘要。使用DUC - 2001和DUC - 2002数据集以及ROUGE度量进行系统评估,并将所得结果与各种现有最优方法进行比较。在这些数据集上,我们在ROUGE - 1和ROUGE - 2得分上比本文提及的最佳现有方法分别提高了约25%和5%,这表明了所提方法的优越性。还从生成距离、CPU处理时间、连贯性和可读性因素等方面对所提系统进行了评估,结果表明系统生成的摘要具有可读性、简洁性、相关性且速度快。我们还进行了双样本t检验和单因素方差分析测试,结果表明所提方法具有统计学意义。