Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Saxony, Germany.
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Saxony, Germany.
PLoS Comput Biol. 2021 Mar 3;17(3):e1008787. doi: 10.1371/journal.pcbi.1008787. eCollection 2021 Mar.
Frequency modulation (FM) is a basic constituent of vocalisation in many animals as well as in humans. In human speech, short rising and falling FM-sweeps of around 50 ms duration, called formant transitions, characterise individual speech sounds. There are two representations of FM in the ascending auditory pathway: a spectral representation, holding the instantaneous frequency of the stimuli; and a sweep representation, consisting of neurons that respond selectively to FM direction. To-date computational models use feedforward mechanisms to explain FM encoding. However, from neuroanatomy we know that there are massive feedback projections in the auditory pathway. Here, we found that a classical FM-sweep perceptual effect, the sweep pitch shift, cannot be explained by standard feedforward processing models. We hypothesised that the sweep pitch shift is caused by a predictive feedback mechanism. To test this hypothesis, we developed a novel model of FM encoding incorporating a predictive interaction between the sweep and the spectral representation. The model was designed to encode sweeps of the duration, modulation rate, and modulation shape of formant transitions. It fully accounted for experimental data that we acquired in a perceptual experiment with human participants as well as previously published experimental results. We also designed a new class of stimuli for a second perceptual experiment to further validate the model. Combined, our results indicate that predictive interaction between the frequency encoding and direction encoding neural representations plays an important role in the neural processing of FM. In the brain, this mechanism is likely to occur at early stages of the processing hierarchy.
调频(FM)是许多动物以及人类发声的基本组成部分。在人类语音中,持续约 50 毫秒的短上升和下降的 FM 扫频,称为共振峰转换,构成了个体的语音。在听觉通路上有两种 FM 的表示方式:一种是频谱表示,保持刺激的瞬时频率;另一种是扫频表示,由对 FM 方向有选择性反应的神经元组成。迄今为止,计算模型使用前馈机制来解释 FM 编码。然而,从神经解剖学我们知道,听觉通路上有大量的反馈投射。在这里,我们发现了一个经典的 FM 扫频感知效应,即扫频音高移位,不能用标准的前馈处理模型来解释。我们假设扫频音高移位是由预测性反馈机制引起的。为了验证这个假设,我们开发了一个新的 FM 编码模型,该模型包含了扫频和频谱表示之间的预测性交互。该模型旨在对共振峰转换的持续时间、调制率和调制形状的扫频进行编码。它完全解释了我们在一项人类参与者感知实验中获得的实验数据,以及之前发表的实验结果。我们还设计了一类新的刺激来进行第二项感知实验,以进一步验证模型。总的来说,我们的结果表明,频率编码和方向编码神经表示之间的预测性相互作用在 FM 的神经处理中起着重要作用。在大脑中,这种机制可能发生在处理层次的早期阶段。