Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America.
Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America.
J Neural Eng. 2021 Mar 23;18(4). doi: 10.1088/1741-2552/abecf0.
Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
听觉范畴知觉(CP)对于理解大脑如何感知语音至关重要,尽管语音的声学特性存在广泛的可变性。在这里,我们研究了反映语音 CP 的听觉神经活动的时空特征(即区分语音原型和歧义语音)。我们记录了 64 通道脑电图,当听众沿着声学-语音连续体快速分类元音时,我们使用支持向量机分类器和稳定性选择来确定通过事件相关电位的源水平分析,在大脑中何时何地以最佳方式解码 CP 。我们发现,早期(120ms)全脑数据可以以 95.16%的准确率(曲线下面积 95.14%;1 分 95.00%)解码语音类别(即原型与歧义音位)。对左半球(LH)和右半球(RH)反应的单独分析表明,LH 解码比 RH 更准确且更早(89.03%比 86.45%的准确率;140ms 比 200ms)。稳定性(特征)选择从 68 个大脑区域中确定了 13 个感兴趣区域(ROI)[包括听觉皮层、缘上回和下额前回(IFG)],这些区域在刺激编码期间(0-260ms)表现出范畴表示。相比之下,需要 15 个 ROI(包括额顶叶区域、IFG、运动皮层)来描述分类的后期决策阶段(300-800ms 后),但这些区域与听众分类听力的强度高度相关(即行为识别函数的斜率)。我们的数据驱动的多元模型表明,抽象类别在语音处理的时间进程中惊人地早(约 120ms)出现,并且主要由相对紧凑的额颞顶叶大脑网络的参与所主导。