文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Deriu Jan, Rodrigo Alvaro, Otegi Arantxa, Echegoyen Guillermo, Rosset Sophie, Agirre Eneko, Cieliebak Mark

Zurich University of Applied Sciences (ZHAW), Steinberggasse 13, 8400 Winterthur, Switzerland.

NLP & IRGroup, UNED, C/Juan del Rosal 16, 28040 Madrid, Spain.

Artif Intell Rev. 2021;54(1):755-810. doi: 10.1007/s10462-020-09866-x. Epub 2020 Jun 25.

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

在本文中，我们综述了为评估对话系统而开发的方法和概念。评估本身就是开发过程中的一个关键部分。通常，对话系统是通过人工评估和问卷调查来进行评估的。然而，这往往成本很高且耗时很长。因此，人们投入了大量工作来寻找能够减少人工参与的方法。在本次综述中，我们介绍主要的概念和方法。为此，我们区分了不同类别的对话系统（面向任务的、会话式的和问答式对话系统）。我们通过介绍为对话系统开发的主要技术来涵盖每一类系统，然后介绍针对该类系统的评估方法。