Abadi Vahid Nejad Mahmood, Ghasemian Fahimeh
Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.
Sci Rep. 2025 Jan 2;15(1):80. doi: 10.1038/s41598-024-78235-3.
In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time to digest comprehensively. A viable solution to address this challenge lies in the realm of automatic text summarization, which is a pivotal and intricate endeavor within the field of natural language processing. Text summarization involves transforming pertinent textual content into a concise format that reduces its word count without compromising its underlying meaning. In recent years, transformers have emerged as a prominent force in the landscape of natural language processing, particularly in the realm of text summarization. This research endeavors to harness the power of transformers by training the mT5-base model on a three-step fine-tuning phase on Persian news articles. Subsequently, reinforcement learning via the PPO algorithm is integrated with the fine-tuned model. Finally, we evaluate the model's performance in summarizing Persian texts, shedding light on its efficacy in addressing the formidable task of distilling meaningful insights from a sea of textual data. Our model has set a new benchmark in the field of Persian text summarization, achieving outstanding ROUGE scores of 53.17 for ROUGE-1, 37.12 for ROUGE-2, and 44.13 for ROUGE-L. These remarkable results reflect a significant advancement in the quality of Persian text summarization, signaling a promising era of more refined and context-aware summaries.
在当代,应对海量大数据是一项艰巨的挑战,尤其是从大量文本来源中提取关键信息时。来自各机构的新闻文章不断涌入,需要大量时间才能全面消化。解决这一挑战的一个可行办法在于自动文本摘要领域,这是自然语言处理领域一项关键且复杂的工作。文本摘要涉及将相关文本内容转换为简洁形式,在不损害其基本含义的情况下减少字数。近年来,Transformer在自然语言处理领域,尤其是在文本摘要领域,已成为一股突出力量。本研究致力于通过在波斯语新闻文章上进行三步微调阶段来训练mT5-base模型,从而利用Transformer的力量。随后,通过近端策略优化(PPO)算法的强化学习与微调后的模型相结合。最后,我们评估该模型在总结波斯语文本方面的性能,揭示其在从海量文本数据中提炼有意义见解这一艰巨任务中的有效性。我们的模型在波斯语文本摘要领域树立了新的标杆,在ROUGE-1上达到了53.17的出色分数,在ROUGE-2上为37.12,在ROUGE-L上为44.13。这些显著成果反映了波斯语文本摘要质量的重大进步,标志着一个更精确、更具上下文感知摘要的充满希望的时代。