倭黑猩猩的退化和计算机生成语音处理。

Degraded and computer-generated speech processing in a bonobo.

机构信息

Department of Psychology, University of York, York, UK.

Center for the Interdisciplinary Study of Language Evolution, University of Zurich, Zurich, Switzerland.

出版信息

Anim Cogn. 2022 Dec;25(6):1393-1398. doi: 10.1007/s10071-022-01621-9. Epub 2022 May 20.

DOI:10.1007/s10071-022-01621-9

PMID:35595881

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9652166/

Abstract

The human auditory system is capable of processing human speech even in situations when it has been heavily degraded, such as during noise-vocoding, when frequency domain-based cues to phonetic content are strongly reduced. This has contributed to arguments that speech processing is highly specialized and likely a de novo evolved trait in humans. Previous comparative research has demonstrated that a language competent chimpanzee was also capable of recognizing degraded speech, and therefore that the mechanisms underlying speech processing may not be uniquely human. However, to form a robust reconstruction of the evolutionary origins of speech processing, additional data from other closely related ape species is needed. Specifically, such data can help disentangle whether these capabilities evolved independently in humans and chimpanzees, or if they were inherited from our last common ancestor. Here we provide evidence of processing of highly varied (degraded and computer-generated) speech in a language competent bonobo, Kanzi. We took advantage of Kanzi's existing proficiency with touchscreens and his ability to report his understanding of human speech through interacting with arbitrary symbols called lexigrams. Specifically, we asked Kanzi to recognise both human (natural) and computer-generated forms of 40 highly familiar words that had been degraded (noise-vocoded and sinusoidal forms) using a match-to-sample paradigm. Results suggest that-apart from noise-vocoded computer-generated speech-Kanzi recognised both natural and computer-generated voices that had been degraded, at rates significantly above chance. Kanzi performed better with all forms of natural voice speech compared to computer-generated speech. This work provides additional support for the hypothesis that the processing apparatus necessary to deal with highly variable speech, including for the first time in nonhuman animals, computer-generated speech, may be at least as old as the last common ancestor we share with bonobos and chimpanzees.

摘要

人类听觉系统能够处理人类语音，即使在语音严重失真的情况下，例如在噪声声码化期间，语音的频域线索被强烈削弱。这使得人们认为语音处理是高度专业化的，可能是人类新进化的特征。先前的比较研究表明，一只具备语言能力的黑猩猩也能够识别失真的语音，因此语音处理的机制可能并非人类独有。然而，为了对语音处理的进化起源进行稳健的重建，还需要来自其他密切相关的猿类物种的额外数据。具体来说，这些数据可以帮助确定这些能力是在人类和黑猩猩中独立进化的，还是从我们的共同祖先那里继承来的。在这里，我们提供了证据表明，一种具备语言能力的倭黑猩猩 Kanzi 能够处理高度变化的（失真和计算机生成的）语音。我们利用 Kanzi 对触摸屏的现有熟练程度以及他通过与任意符号（称为 lexigrams）交互来报告对人类语音理解的能力。具体来说，我们要求 Kanzi 通过匹配样本范式识别 40 个高度熟悉的单词的人类（自然）和计算机生成形式，这些单词已经被降级（噪声声码化和正弦形式）。结果表明，除了计算机生成的噪声声码化语音外，Kanzi 还能够以明显高于随机的速度识别自然和计算机生成的已降级语音。与计算机生成的语音相比，Kanzi 在所有自然语音形式上的表现都更好。这项工作为处理高度变化的语音所需的处理装置的假设提供了额外的支持，包括首次在非人类动物中，计算机生成的语音，可能至少与我们与倭黑猩猩和黑猩猩共同的祖先一样古老。