Cognitive Science Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London.
Dialogue Systems Group, Faculty of Linguistics and Literature, Bielefeld University.
Top Cogn Sci. 2018 Apr;10(2):425-451. doi: 10.1111/tops.12324. Epub 2018 Mar 8.
Miscommunication phenomena such as repair in dialogue are important indicators of the quality of communication. Automatic detection is therefore a key step toward tools that can characterize communication quality and thus help in applications from call center management to mental health monitoring. However, most existing computational linguistic approaches to these phenomena are unsuitable for general use in this way, and particularly for analyzing human-human dialogue: Although models of other-repair are common in human-computer dialogue systems, they tend to focus on specific phenomena (e.g., repair initiation by systems), missing the range of repair and repair initiation forms used by humans; and while self-repair models for speech recognition and understanding are advanced, they tend to focus on removal of "disfluent" material important for full understanding of the discourse contribution, and/or rely on domain-specific knowledge. We explain the requirements for more satisfactory models, including incrementality of processing and robustness to sparsity. We then describe models for self- and other-repair detection that meet these requirements (for the former, an adaptation of an existing repair model; for the latter, an adaptation of standard techniques) and investigate how they perform on datasets from a range of dialogue genres and domains, with promising results.
交际中的误解现象(如对话中的修正)是交际质量的重要指标。因此,自动检测是开发能够刻画交际质量的工具的关键步骤,这有助于从呼叫中心管理到心理健康监测等应用。然而,大多数现有的计算语言学方法并不适合以这种方式普遍使用,特别是不适合分析人际对话:虽然其他修复的模型在人机对话系统中很常见,但它们往往侧重于特定的现象(例如,系统发起的修复),而忽略了人类使用的各种修复和修复发起形式;虽然用于语音识别和理解的自我修复模型很先进,但它们往往侧重于去除“不流畅”的材料,这些材料对于充分理解话语贡献很重要,或者依赖于特定领域的知识。我们解释了更令人满意的模型的要求,包括处理的增量性和对稀疏性的鲁棒性。然后,我们描述了满足这些要求的自我修复和他人修复检测模型(前者是对现有修复模型的改编,后者是对标准技术的改编),并研究了它们在来自各种对话类型和领域的数据集上的表现,结果令人鼓舞。