Deriu Jan, Rodrigo Alvaro, Otegi Arantxa, Echegoyen Guillermo, Rosset Sophie, Agirre Eneko, Cieliebak Mark
Zurich University of Applied Sciences (ZHAW), Steinberggasse 13, 8400 Winterthur, Switzerland.
NLP & IRGroup, UNED, C/Juan del Rosal 16, 28040 Madrid, Spain.
Artif Intell Rev. 2021;54(1):755-810. doi: 10.1007/s10462-020-09866-x. Epub 2020 Jun 25.
In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.
在本文中,我们综述了为评估对话系统而开发的方法和概念。评估本身就是开发过程中的一个关键部分。通常,对话系统是通过人工评估和问卷调查来进行评估的。然而,这往往成本很高且耗时很长。因此,人们投入了大量工作来寻找能够减少人工参与的方法。在本次综述中,我们介绍主要的概念和方法。为此,我们区分了不同类别的对话系统(面向任务的、会话式的和问答式对话系统)。我们通过介绍为对话系统开发的主要技术来涵盖每一类系统,然后介绍针对该类系统的评估方法。