Suppr超能文献

Two-Stage Unet with Gated-Conv Fusion for Binaural Audio Synthesis.

作者信息

Zhang Wenjie, He Changjun, Cao Yinghan, Xu Shiyun, Wang Mingjiang

机构信息

Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China.

出版信息

Sensors (Basel). 2025 Mar 13;25(6):1790. doi: 10.3390/s25061790.

Abstract

Binaural audio is crucial for creating immersive auditory experiences. However, due to the high cost and technical complexity of capturing binaural audio in real-world environments, there has been increasing interest in synthesizing binaural audio from monaural sources. In this paper, we propose a two-stage framework for binaural audio synthesis. Specifically, monaural audio is initially transformed into a preliminary binaural signal, and the shared common portion across the left and right channels, as well as the distinct differential portion in each channel, are extracted. Subsequently, the POS-ORI self-attention module (POSA) is introduced to integrate spatial information of the sound sources and capture their motion. Based on this representation, the common and differential components are separately reconstructed. The gated-convolutional fusion module (GCFM) is then employed to combine the reconstructed components and generate the final binaural audio. Experimental results demonstrate that the proposed method can accurately synthesize binaural audio and achieves state-of-the-art performance in phase estimation (Phase-l2: 0.789, Wave-l2: 0.147, Amplitude-l2: 0.036).

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/072f/11946021/2ac7d990cee9/sensors-25-01790-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验