Department of Psychology, University of Edinburgh Edinburgh, UK.
Front Psychol. 2013 Jun 21;4:340. doi: 10.3389/fpsyg.2013.00340. eCollection 2013.
It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i) the Episodic Theory (ET) of speech perception and production (Goldinger, 1998); (ii) the Motor Theory (MT) of speech perception (Liberman and Whalen, 2000; Galantucci et al., 2006); (iii) Communication Accommodation Theory (CAT; Giles and Coupland, 1991; Giles et al., 1991). We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT), and higher-level accounts (like CAT). We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering and Garrod, 2013). Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers' utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e., the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker's and listener's social identities, their conversational roles, the listener's intention to imitate).
有人认为,言语中的个体内和个体间变异性是相关的。研究表明,对话者在各种语音维度上趋于一致。此外,在影子跟读、重复和被动听力任务中,说话者会模仿他们所接触到的声音的语音特征。我们回顾了语音模仿和趋同现象的三个理论解释:(i)言语感知和产生的情景理论(ET;Goldinger,1998);(ii)言语感知的运动理论(MT;Liberman 和 Whalen,2000;Galantucci 等人,2006);(iii)交际适应理论(CAT;Giles 和 Coupland,1991;Giles 等人,1991)。我们认为,没有一个解释能够解释所有现有的证据。特别是,需要将低水平的机械解释(如 ET 和 MT)和高水平的解释(如 CAT)结合起来。我们认为,在生产和理解的综合理论框架内(Pickering 和 Garrod,2013),这是可能的。与 ET 和 MT 一样,该理论假设产生和感知之间的等价性。然而,独特的是,它假设听众通过在许多不同的层次上计算前向模型预测来模拟说话者的话语,然后将这些预测与传入的语音输入进行比较。在我们的解释中,语音模仿可以通过负责感觉运动适应的相同机制来实现;即,预测误差的校正。此外,该模型假设,感觉预测误差导致运动调整的程度取决于上下文。上下文的概念包含了前面的语言输入和情境的非语言属性(例如,说话者和听众的社会身份、他们的会话角色、听众模仿的意图)。