Suppr超能文献

基于数据驱动的机器学习模型,用于解码大脑反应中的语音分类。

Data-driven machine learning models for decoding speech categorization from evoked brain responses.

机构信息

Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America.

Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America.

出版信息

J Neural Eng. 2021 Mar 23;18(4). doi: 10.1088/1741-2552/abecf0.

Abstract

Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.

摘要

听觉范畴知觉(CP)对于理解大脑如何感知语音至关重要,尽管语音的声学特性存在广泛的可变性。在这里,我们研究了反映语音 CP 的听觉神经活动的时空特征(即区分语音原型和歧义语音)。我们记录了 64 通道脑电图,当听众沿着声学-语音连续体快速分类元音时,我们使用支持向量机分类器和稳定性选择来确定通过事件相关电位的源水平分析,在大脑中何时何地以最佳方式解码 CP 。我们发现,早期(120ms)全脑数据可以以 95.16%的准确率(曲线下面积 95.14%;1 分 95.00%)解码语音类别(即原型与歧义音位)。对左半球(LH)和右半球(RH)反应的单独分析表明,LH 解码比 RH 更准确且更早(89.03%比 86.45%的准确率;140ms 比 200ms)。稳定性(特征)选择从 68 个大脑区域中确定了 13 个感兴趣区域(ROI)[包括听觉皮层、缘上回和下额前回(IFG)],这些区域在刺激编码期间(0-260ms)表现出范畴表示。相比之下,需要 15 个 ROI(包括额顶叶区域、IFG、运动皮层)来描述分类的后期决策阶段(300-800ms 后),但这些区域与听众分类听力的强度高度相关(即行为识别函数的斜率)。我们的数据驱动的多元模型表明,抽象类别在语音处理的时间进程中惊人地早(约 120ms)出现,并且主要由相对紧凑的额颞顶叶大脑网络的参与所主导。

相似文献

本文引用的文献

5
Effects of Noise on the Behavioral and Neural Categorization of Speech.噪声对言语行为和神经分类的影响。
Front Neurosci. 2020 Feb 27;14:153. doi: 10.3389/fnins.2020.00153. eCollection 2020.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验