Poeppel David, Idsardi William J, van Wassenhove Virginie
Department of Linguistics, University of Maryland, College Park, MD 20742, USA.
Philos Trans R Soc Lond B Biol Sci. 2008 Mar 12;363(1493):1071-86. doi: 10.1098/rstb.2007.2160.
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
言语感知由一组计算组成,这些计算将连续变化的声学波形作为输入,并生成离散表征作为输出,这些离散表征与存储在长期记忆中的词汇表征相联系。由于言语感知所识别的感知对象会进入后续的语言计算,用于词汇表征和处理的格式从根本上限制了言语感知过程。因此,言语感知理论在某种程度上必须与词汇表征理论紧密相连。至少,言语感知必须产生能够与存储的词汇项平滑且快速对接的表征。从马尔的视角出发,我们论证并提供神经生物学和心理物理学证据以支持以下研究计划。首先,在实现层面,言语感知是一个多时间分辨率的过程,感知分析至少在两个时间尺度(约20 - 80毫秒,约150 - 300毫秒)上同时进行,分别对应于(亚)音段和音节分析。其次,在算法层面,我们认为感知基于内部前向模型进行,或者采用“通过合成进行分析”的方法。第三,在计算层面(按照马尔的意义),我们采用的词汇表征理论主要受语音学研究的启发,并假设单词在心理词典中以由区别特征组成的离散片段序列来表征。该研究计划的一个重要目标是在假定的神经生物学原语(如时间原语)与从语言探究中得出的原语之间建立联系假设,最终得出一个在生物学上合理且在理论上令人满意的言语表征和计算模型。