Department of Communication Sciences and Disorders, The University of Texas at Austin.
Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh.
J Speech Lang Hear Res. 2019 Mar 25;62(3):587-601. doi: 10.1044/2018_JSLHR-S-ASTM-18-0244.
Purpose Speech-evoked neurophysiological responses are often collected to answer clinically and theoretically driven questions concerning speech and language processing. Here, we highlight the practical application of machine learning (ML)-based approaches to analyzing speech-evoked neurophysiological responses. Method Two categories of ML-based approaches are introduced: decoding models, which generate a speech stimulus output using the features from the neurophysiological responses, and encoding models, which use speech stimulus features to predict neurophysiological responses. In this review, we focus on (a) a decoding model classification approach, wherein speech-evoked neurophysiological responses are classified as belonging to 1 of a finite set of possible speech events (e.g., phonological categories), and (b) an encoding model temporal response function approach, which quantifies the transformation of a speech stimulus feature to continuous neural activity. Results We illustrate the utility of the classification approach to analyze early electroencephalographic (EEG) responses to Mandarin lexical tone categories from a traditional experimental design, and to classify EEG responses to English phonemes evoked by natural continuous speech (i.e., an audiobook) into phonological categories (plosive, fricative, nasal, and vowel). We also demonstrate the utility of temporal response function to predict EEG responses to natural continuous speech from acoustic features. Neural metrics from the 3 examples all exhibit statistically significant effects at the individual level. Conclusion We propose that ML-based approaches can complement traditional analysis approaches to analyze neurophysiological responses to speech signals and provide a deeper understanding of natural speech and language processing using ecologically valid paradigms in both typical and clinical populations.
目的 语音诱发的神经生理反应通常被采集,以回答与言语和语言处理有关的临床和理论驱动问题。在这里,我们强调了基于机器学习(ML)的方法在分析语音诱发的神经生理反应中的实际应用。方法 介绍了两种基于 ML 的方法:解码模型,使用神经生理反应的特征生成语音刺激输出;以及编码模型,使用语音刺激特征来预测神经生理反应。在这篇综述中,我们重点介绍(a)一种解码模型分类方法,其中语音诱发的神经生理反应被分类为属于有限数量的可能语音事件之一(例如,语音类别),以及(b)一种编码模型时间响应函数方法,该方法量化了语音刺激特征到连续神经活动的转换。结果 我们说明了分类方法在分析来自传统实验设计的汉语词汇声调类别的早期脑电图(EEG)反应以及将由自然连续语音(即有声读物)引起的英语音素的 EEG 反应分类为语音类别的(爆破音、摩擦音、鼻音和元音)中的效用。我们还展示了时间响应函数从声学特征预测自然连续语音的 EEG 反应的效用。这三个例子中的神经指标在个体水平上都表现出统计学上显著的效应。结论 我们提出,基于 ML 的方法可以补充传统的分析方法,以分析语音信号的神经生理反应,并使用典型和临床人群中的生态有效范式,提供对自然言语和语言处理的更深入理解。