School of Computing, Universiti Teknologi Malaysia, Johor, Malaysia.
Department of Computer Science, Yusuf Maitama Sule University, Kano, Nigeria.
PLoS One. 2023 May 9;18(5):e0285376. doi: 10.1371/journal.pone.0285376. eCollection 2023.
Automatic text summarization is one of the most promising solutions to the ever-growing challenges of textual data as it produces a shorter version of the original document with fewer bytes, but the same information as the original document. Despite the advancements in automatic text summarization research, research involving the development of automatic text summarization methods for documents written in Hausa, a Chadic language widely spoken in West Africa by approximately 150,000,000 people as either their first or second language, is still in early stages of development. This study proposes a novel graph-based extractive single-document summarization method for Hausa text by modifying the existing PageRank algorithm using the normalized common bigrams count between adjacent sentences as the initial vertex score. The proposed method is evaluated using a primarily collected Hausa summarization evaluation dataset comprising of 113 Hausa news articles on ROUGE evaluation toolkits. The proposed approach outperformed the standard methods using the same datasets. It outperformed the TextRank method by 2.1%, LexRank by 12.3%, centroid-based method by 19.5%, and BM25 method by 17.4%.
自动文本摘要技术是解决文本数据不断增长挑战的最有前途的方法之一,它可以生成原始文档的更短版本,字节数更少,但包含与原始文档相同的信息。尽管自动文本摘要研究取得了进展,但针对豪萨语(一种广泛分布于西非的乍得语,约有 1.5 亿人将其作为第一或第二语言使用)文档的自动文本摘要方法的研究仍处于早期阶段。本研究提出了一种新颖的基于图的豪萨语抽取式单文档摘要方法,通过使用相邻句子之间的归一化公共二元组计数作为初始顶点得分来修改现有的 PageRank 算法。该方法使用主要收集的包含 113 篇豪萨语新闻文章的豪萨语摘要评估数据集,在 ROUGE 评估工具包上进行了评估。与使用相同数据集的标准方法相比,该方法表现出色。与 TextRank 方法相比,它的性能提高了 2.1%,与 LexRank 相比提高了 12.3%,与基于质心的方法相比提高了 19.5%,与 BM25 方法相比提高了 17.4%。