Wang Heming, Zhang Xueliang, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, USA.
Department of Computer Science, Inner Mongolia University, China.
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7757-7761. doi: 10.1109/icassp43922.2022.9746374. Epub 2022 Apr 27.
Bone-conduction (BC) microphones capture speech signals by converting the vibrations of the human skull into electrical signals. BC sensors are insensitive to acoustic noise, but limited in bandwidth. On the other hand, conventional or air-conduction (AC) microphones are capable of capturing full-band speech, but are susceptible to background noise. We propose to combine the strengths of AC and BC microphones by employing a convolutional recurrent network that performs complex spectral mapping. To better utilize signals from both kinds of microphone, we employ attention-based fusion with early-fusion and late-fusion strategies. Experiments demonstrate the superiority of the proposed method over other recent speech enhancement methods combining BC and AC signals. In addition, our enhancement performance is significantly better than conventional speech enhancement counterparts, especially in low signal-to-noise ratio scenarios.
骨传导(BC)麦克风通过将人类头骨的振动转换为电信号来捕捉语音信号。BC传感器对声学噪声不敏感,但带宽有限。另一方面,传统的或空气传导(AC)麦克风能够捕捉全频段语音,但容易受到背景噪声的影响。我们建议通过采用执行复杂频谱映射的卷积循环网络来结合AC和BC麦克风的优势。为了更好地利用来自两种麦克风的信号,我们采用基于注意力的融合,并结合早期融合和晚期融合策略。实验证明了所提出的方法优于其他最近结合BC和AC信号的语音增强方法。此外,我们增强后的性能明显优于传统的语音增强方法,特别是在低信噪比场景下。