Liu Nian, Chen Huawei, Songgong Kunkun, Li Yanwen
College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
J Acoust Soc Am. 2021 Feb;149(2):1069. doi: 10.1121/10.0003445.
Sound source localization in noisy and reverberant rooms using microphone arrays remains a challenging task, especially for small-sized arrays. Recent years have seen promising advances on deep learning assisted approaches by reformulating the sound localization problem as a classification one. A key to the deep learning-based approaches lies in extracting sound location features effectively in noisy and reverberant conditions. The popularly adopted features are based on the well-established generalized cross correlation phase transform (GCC-PHAT), which is known to be helpful in combating room reverberation. However, the GCC-PHAT features may not be applicable to small-sized arrays. This paper proposes a deep learning assisted sound localization method using a small-sized microphone array constructed by two orthogonal first-order differential microphone arrays. An improved feature extraction scheme based on sound intensity estimation is also proposed by decoupling the correlation between sound pressure and particle velocity components in the whitening weighting construction to enhance the robustness of the time-frequency bin-wise sound intensity features. Simulation and real-world experimental results show that the proposed deep learning assisted approach can achieve higher spatial resolution and is superior to its state-of-the-art counterparts using the GCC-PHAT or sound intensity features for small-sized arrays in noisy and reverberant environments.
在嘈杂且有混响的房间中使用麦克风阵列进行声源定位仍然是一项具有挑战性的任务,尤其是对于小型阵列而言。近年来,通过将声音定位问题重新表述为分类问题,深度学习辅助方法取得了令人瞩目的进展。基于深度学习的方法的关键在于在嘈杂和混响条件下有效地提取声音位置特征。普遍采用的特征基于成熟的广义互相关相位变换(GCC-PHAT),众所周知,它有助于对抗房间混响。然而,GCC-PHAT特征可能不适用于小型阵列。本文提出了一种使用由两个正交一阶差分麦克风阵列构成的小型麦克风阵列的深度学习辅助声音定位方法。还通过在白化加权构造中解耦声压和质点速度分量之间的相关性,提出了一种基于声强估计的改进特征提取方案,以增强时频逐仓声强特征的鲁棒性。仿真和实际实验结果表明,所提出的深度学习辅助方法可以实现更高的空间分辨率,并且在嘈杂和混响环境中,对于小型阵列而言,优于使用GCC-PHAT或声强特征的同类先进方法。