Brodbeck Christian, Hannagan Thomas, Magnuson James S
Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada.
Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America.
PLoS Comput Biol. 2025 Jul 28;21(7):e1013244. doi: 10.1371/journal.pcbi.1013244. eCollection 2025 Jul.
Human speech recognition transforms a continuous acoustic signal into categorical linguistic units, by aggregating information that is distributed in time. It has been suggested that this kind of information processing may be understood through the computations of a Recurrent Neural Network (RNN) that receives input frame by frame, linearly in time, but builds an incremental representation of this input through a continually evolving internal state. While RNNs can simulate several key behavioral observations about human speech and language processing, it is unknown whether RNNs also develop computational dynamics that resemble human neural speech processing. Here we show that the internal dynamics of long short-term memory (LSTM) RNNs, trained to recognize speech from auditory spectrograms, predict human neural population responses to the same stimuli, beyond predictions from auditory features. Variations in the RNN architecture motivated by cognitive principles further improved this predictive power. Specifically, modifications that allow more human-like phonetic competition also led to more human-like temporal dynamics. Overall, our results suggest that RNNs provide plausible computational models of the cortical processes supporting human speech recognition.
人类语音识别通过聚合随时间分布的信息,将连续的声学信号转换为分类语言单元。有人提出,这种信息处理方式可以通过循环神经网络(RNN)的计算来理解,该网络逐帧接收输入,在时间上呈线性,但通过不断演变的内部状态构建输入的增量表示。虽然RNN可以模拟关于人类语音和语言处理的几个关键行为观察结果,但尚不清楚RNN是否也会发展出类似于人类神经语音处理的计算动态。在这里,我们表明,经过训练以从听觉频谱图中识别语音的长短期记忆(LSTM)RNN的内部动态,能够预测人类神经群体对相同刺激的反应,超出了听觉特征的预测。受认知原理启发的RNN架构变化进一步提高了这种预测能力。具体而言,允许更类似人类语音竞争的修改也导致了更类似人类的时间动态。总体而言,我们的结果表明,RNN为支持人类语音识别的皮层过程提供了合理的计算模型。