Cui Yiming, Liu Ting, Che Wanxiang, Chen Zhigang, Wang Shijin
Research Center for SCIR, Harbin Institute of Technology, Harbin 150001, China.
State Key Laboratory of Cognitive Intelligence, iFLYTEK Research, Beijing 100010, China.
Heliyon. 2022 Apr 19;8(4):e09290. doi: 10.1016/j.heliyon.2022.e09290. eCollection 2022 Apr.
Achieving human-level performance on some Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the textual explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE, and C, with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art PLMs to build baseline systems and adopt various unsupervised approaches to extract both answer and evidence spans without human-annotated evidence spans. The experimental results show that these models are still far from human performance, suggesting that the ExpMRC is challenging. Resources (data and baselines) are available through https://github.com/ymcui/expmrc.
借助强大的预训练语言模型(PLM),在某些机器阅读理解(MRC)数据集上实现人类水平的性能已不再具有挑战性。然而,为了进一步提高MRC系统的可靠性,特别是对于实际应用而言,有必要同时提供答案预测及其解释。在本文中,我们提出了一个名为ExpMRC的新基准,用于评估MRC系统的文本可解释性。ExpMRC包含四个子集,包括SQuAD、CMRC 2018、RACE和C,并带有答案证据的附加注释。MRC系统不仅需要给出正确答案,还需要给出其解释。我们使用最先进的PLM来构建基线系统,并采用各种无监督方法来提取答案和证据跨度,而无需人工标注的证据跨度。实验结果表明,这些模型仍远未达到人类性能,这表明ExpMRC具有挑战性。可通过https://github.com/ymcui/expmrc获取资源(数据和基线)。