Zhang Shouming, Zhang Yaling, Liao Yixiao, Pang Kunkun, Wan Zhiyong, Zhou Songbin
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China.
Institute of Intelligent Manufacturing, Guangdong Academy of Science, Guangdong Key Laboratory of Modern Control Technology, Guangzhou 510030, China.
Math Biosci Eng. 2024 Jan 8;21(2):2004-2023. doi: 10.3934/mbe.2024089.
Sound event localization and detection have been applied in various fields. Due to the polyphony and noise interference, it becomes challenging to accurately predict the sound event and their occurrence locations. Aiming at this problem, we propose a Multiple Attention Fusion ResNet, which uses ResNet34 as the base network. Given the situation that the sound duration is not fixed, and there are multiple polyphonic and noise, we introduce the Gated Channel Transform to enhance the residual basic block. This enables the model to capture contextual information, evaluate channel weights, and reduce the interference caused by polyphony and noise. Furthermore, Split Attention is introduced to the model for capturing cross-channel information, which enhances the ability to distinguish the polyphony. Finally, Coordinate Attention is introduced to the model so that the model can focus on both the channel information and spatial location information of sound events. Experiments were conducted on two different datasets, TAU-NIGENS Spatial Sound Events 2020, and TAU-NIGENS Spatial Sound Events 2021. The results demonstrate that the proposed model significantly outperforms state-of-the-art methods under multiple polyphonic and noise-directional interference environments and it achieves competitive performance under a single polyphonic environment.
声音事件定位与检测已应用于各个领域。由于存在复音和噪声干扰,准确预测声音事件及其发生位置变得具有挑战性。针对这一问题,我们提出了一种多重注意力融合残差网络(Multiple Attention Fusion ResNet),它使用ResNet34作为基础网络。鉴于声音持续时间不固定且存在多个复音和噪声的情况,我们引入门控通道变换(Gated Channel Transform)来增强残差基本块。这使模型能够捕捉上下文信息、评估通道权重,并减少复音和噪声造成的干扰。此外,将分裂注意力(Split Attention)引入模型以捕捉跨通道信息,增强了区分复音的能力。最后,将坐标注意力(Coordinate Attention)引入模型,以便模型能够同时关注声音事件的通道信息和空间位置信息。在两个不同的数据集TAU-NIGENS Spatial Sound Events 2020和TAU-NIGENS Spatial Sound Events 2021上进行了实验。结果表明,所提出的模型在多种复音和噪声方向干扰环境下显著优于现有方法,并且在单一复音环境下也取得了有竞争力的性能。