Department of Speech, Hearing and Phonetic Sciences, University College London, London WC1N 1PF, United Kingdom
School of Digital Humanities and Computational Social Sciences, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea.
eNeuro. 2024 Aug 20;11(8). doi: 10.1523/ENEURO.0507-23.2024. Print 2024 Aug.
Adults heard recordings of two spatially separated speakers reading newspaper and magazine articles. They were asked to listen to one of them and ignore the other, and EEG was recorded to assess their neural processing. Machine learning extracted neural sources that tracked the target and distractor speakers at three levels: the acoustic envelope of speech (delta- and theta-band modulations), lexical frequency for individual words, and the contextual predictability of individual words estimated by GPT-4 and earlier lexical models. To provide a broader view of speech perception, half of the subjects completed a simultaneous visual task, and the listeners included both native and non-native English speakers. Distinct neural components were extracted for these levels of auditory and lexical processing, demonstrating that native English speakers had greater target-distractor separation compared with non-native English speakers on most measures, and that lexical processing was reduced by the visual task. Moreover, there was a novel interaction of lexical predictability and frequency with auditory processing; acoustic tracking was stronger for lexically harder words, suggesting that people listened harder to the acoustics when needed for lexical selection. This demonstrates that speech perception is not simply a feedforward process from acoustic processing to the lexicon. Rather, the adaptable context-sensitive processing long known to occur at a lexical level has broader consequences for perception, coupling with the acoustic tracking of individual speakers in noise.
成年人听了两段来自两个空间分离的说话者的朗读的报纸和杂志文章的录音,他们被要求只听其中一个说话者的内容而忽略另一个。研究人员记录他们的脑电活动以评估他们的神经处理过程。机器学习从三个层面提取了跟踪目标和干扰说话者的神经源:言语的声谱包络(delta 和 theta 波段调制)、单个单词的词汇频率,以及由 GPT-4 和早期词汇模型估计的单个单词的上下文可预测性。为了更全面地了解言语感知,一半的受试者完成了同时的视觉任务,参与者包括英语母语者和非母语者。在这些听觉和词汇处理层面上,都提取到了不同的神经成分,这表明在大多数指标上,英语母语者比非英语母语者具有更强的目标-干扰者分离能力,而视觉任务则降低了词汇处理能力。此外,词汇可预测性和频率与听觉处理之间还存在一种新的相互作用;对于词汇较难的单词,声学跟踪更强,这表明当需要进行词汇选择时,人们会更努力地倾听声学信息。这表明言语感知不仅仅是从声学处理到词汇的简单前馈过程。相反,在词汇层面上长期存在的适应性、上下文敏感处理对感知具有更广泛的影响,与噪声中单个说话者的声学跟踪相结合。