Big Data Research Group, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.
Department of Linguistics, Faculty of Foreign Languages, University of Isfahan, Isfahan, Iran.
Comput Intell Neurosci. 2022 Jul 18;2022:3661286. doi: 10.1155/2022/3661286. eCollection 2022.
Question answering (QA) systems have attracted considerable attention in recent years. They receive the user's questions in natural language and respond to them with precise answers. Most of the works on QA were initially proposed for the English language, but some research studies have recently been performed on non-English languages. Answer selection (AS) is a critical component in QA systems. To the best of our knowledge, there is no research on AS for the Persian language. Persian is a (1) free word order, (2) right-to-left, (3) morphologically rich, and (4) low-resource language. Deep learning (DL) techniques have shown promising accuracy in AS. Although DL performs very well on QA, it requires a considerable amount of annotated data for training. Many annotated datasets have been built for the AS task; most of them are exclusively in English. In order to address the need for a high-quality AS dataset in the Persian language, we present PASD; the first large-scale native AS dataset for the Persian language. To show the quality of PASD, we employed it to train state-of-the-art QA systems. We also present PerAnSel: a novel deep neural network-based system for Persian question answering. Since the Persian language is a free word-order language, in PerAnSel, we parallelize a sequential method and a transformer-based method to handle various orders in the Persian language. We then evaluate PerAnSel on three datasets: PASD, PerCQA, and WikiFA. The experimental results indicate strong performance on the Persian datasets beating state-of-the-art answer selection methods by 10.66% on PASD, 8.42% on PerCQA, and 3.08% on WikiFA datasets in terms of MRR.
问答 (QA) 系统近年来引起了相当大的关注。它们以自然语言接收用户的问题,并以精确的答案回答。大多数关于 QA 的工作最初是针对英语提出的,但最近也有一些关于非英语语言的研究。答案选择 (AS) 是 QA 系统的一个关键组成部分。据我们所知,目前还没有针对波斯语的 AS 研究。波斯语是一种(1)自由词序、(2)从右到左、(3)形态丰富、(4)资源匮乏的语言。深度学习 (DL) 技术在 AS 中表现出了有希望的准确性。尽管 DL 在 QA 中表现非常出色,但它需要大量的标注数据进行训练。已经为 AS 任务构建了许多标注数据集;其中大多数都是专门用于英语的。为了满足对波斯语高质量 AS 数据集的需求,我们提出了 PASD;这是第一个针对波斯语的大规模本地 AS 数据集。为了展示 PASD 的质量,我们使用它来训练最先进的 QA 系统。我们还提出了 PerAnSel:一种用于波斯语问答的新型基于深度神经网络的系统。由于波斯语是一种自由词序语言,在 PerAnSel 中,我们并行化了一种基于序列的方法和一种基于转换器的方法,以处理波斯语中的各种词序。然后,我们在三个数据集上评估 PerAnSel:PASD、PerCQA 和 WikiFA。实验结果表明,在波斯语数据集上的表现非常出色,在 PASD 上比最先进的答案选择方法高出 10.66%,在 PerCQA 上高出 8.42%,在 WikiFA 上高出 3.08%,在 MRR 方面。