Akram Sahar, Presacco Alessandro, Simon Jonathan Z, Shamma Shihab A, Babadi Behtash
Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA; Institute for Systems Research, University of Maryland, College Park, MD 20742, USA.
Department of Hearing and Speech Science, University of Maryland, College Park, MD 20742, USA.
Neuroimage. 2016 Jan 1;124(Pt A):906-917. doi: 10.1016/j.neuroimage.2015.09.048. Epub 2015 Oct 4.
The underlying mechanism of how the human brain solves the cocktail party problem is largely unknown. Recent neuroimaging studies, however, suggest salient temporal correlations between the auditory neural response and the attended auditory object. Using magnetoencephalography (MEG) recordings of the neural responses of human subjects, we propose a decoding approach for tracking the attentional state while subjects are selectively listening to one of the two speech streams embedded in a competing-speaker environment. We develop a biophysically-inspired state-space model to account for the modulation of the neural response with respect to the attentional state of the listener. The constructed decoder is based on a maximum a posteriori (MAP) estimate of the state parameters via the Expectation Maximization (EM) algorithm. Using only the envelope of the two speech streams as covariates, the proposed decoder enables us to track the attentional state of the listener with a temporal resolution of the order of seconds, together with statistical confidence intervals. We evaluate the performance of the proposed model using numerical simulations and experimentally measured evoked MEG responses from the human brain. Our analysis reveals considerable performance gains provided by the state-space model in terms of temporal resolution, computational complexity and decoding accuracy.
人类大脑如何解决鸡尾酒会问题的潜在机制在很大程度上尚不清楚。然而,最近的神经影像学研究表明,听觉神经反应与被关注的听觉对象之间存在显著的时间相关性。利用人类受试者神经反应的脑磁图(MEG)记录,我们提出了一种解码方法,用于在受试者选择性地收听嵌入在竞争说话者环境中的两个语音流之一时跟踪其注意力状态。我们开发了一个受生物物理学启发的状态空间模型,以解释神经反应相对于听众注意力状态的调制。构建的解码器基于通过期望最大化(EM)算法对状态参数的最大后验(MAP)估计。仅使用两个语音流的包络作为协变量,所提出的解码器使我们能够以秒级的时间分辨率以及统计置信区间来跟踪听众的注意力状态。我们使用数值模拟和从人类大脑实验测量的诱发MEG反应来评估所提出模型的性能。我们的分析揭示了状态空间模型在时间分辨率、计算复杂度和解码准确性方面带来的显著性能提升。