Alickovic Emina, Lunner Thomas, Gustafsson Fredrik, Ljung Lennart
Department of Electrical Engineering, Linkoping University, Linkoping, Sweden.
Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.
Front Neurosci. 2019 Mar 19;13:153. doi: 10.3389/fnins.2019.00153. eCollection 2019.
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
听觉注意力识别方法试图通过分析电生理数据测量值来识别听众感兴趣的声源。我们提供了一个关于近几十年来所开发的众多技术的教程,并概述了基于多变量相关性和基于模型的学习框架的当前趋势。重点在于电生理数据与音频数据之间线性关系的应用。计算这些关系的方式有所不同。例如,典型相关分析(CCA)会找到与音频数据最佳相关的电生理数据线性子集以及与电生理数据最佳相关的类似音频数据子集。基于模型(编码和解码)的方法则侧重于这两组中的任意一组。我们研究这些线性模型理念之间的异同。我们重点关注:(1)基于相关性的方法(CCA);(2)基于密集估计的编码/解码模型;以及(3)基于稀疏估计的(自适应)编码/解码模型。具体重点是稀疏性驱动的自适应编码模型,并比较听觉文献中现有模型的方法。此外,我们概述了在鸡尾酒会环境中如何从原始电生理数据识别被关注声源的主要信号处理流程,包括所有必要步骤,并辅以必要的MATLAB代码以及每个步骤的相关参考文献。我们的主要目的是比较现有方法的方法,并对其中一些方法提供数值示例,以感受它们的潜力。本教程不涉及全面的性能比较。