• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

融合骨导和气导语音的轻量级语音增强网络

A lightweight speech enhancement network fusing bone- and air-conducted speech.

机构信息

Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China.

University of Chinese Academy of Sciences, Beijing 100049, China.

出版信息

J Acoust Soc Am. 2024 Aug 1;156(2):1355-1366. doi: 10.1121/10.0028339.

DOI:10.1121/10.0028339
PMID:39185901
Abstract

Air-conducted (AC) microphones capture the high-quality desired speech and ambient noise, whereas bone-conducted (BC) microphones are immune to ambient noise but only capture band limited speech. This paper proposes a speech enhancement model that leverages the merits of BC and AC speech. The proposed model takes the spectrogram of BC and AC speech as input and fuses them by an attention-based feature fusion module. The backbone network of the proposed model uses the fused signals to estimate mask of the target speech, which is then applied to the noisy AC speech to recover the target speech. The proposed model adopts a lightweight design of densely gated convolutional attention network (DenGCAN) as the backbone network, which contains encoder, bottleneck layers, and decoder. Furthermore, this paper improves an attention gate and integrates it into skip-connections of DenGCAN, which allows the decoder to focus on the key areas of the feature map extracted by the encoder. As the DenGCAN adopts self-attention mechanism, the proposed model has the potential to improve noise reduction performance at the expense of an increased input-output latency. Experimental results demonstrate that the enhanced speech of the proposed model achieves an average 1.870 wideband-PESQ improvement over the noisy AC speech.

摘要

空气传导(AC)麦克风可以捕捉高质量的期望语音和环境噪声,而骨传导(BC)麦克风则不受环境噪声影响,但只能捕捉带宽有限的语音。本文提出了一种利用 BC 和 AC 语音优点的语音增强模型。该模型以 BC 和 AC 语音的频谱图作为输入,并通过基于注意力的特征融合模块对它们进行融合。该模型的骨干网络使用融合信号来估计目标语音的掩蔽,然后将其应用于噪声 AC 语音以恢复目标语音。该模型采用轻量级的密集门控卷积注意网络(DenGCAN)作为骨干网络,包含编码器、瓶颈层和解码器。此外,本文改进了一个注意力门,并将其集成到 DenGCAN 的跳过连接中,使解码器能够专注于编码器提取的特征图的关键区域。由于 DenGCAN 采用了自注意力机制,因此该模型有可能在增加输入-输出延迟的情况下提高降噪性能。实验结果表明,与噪声 AC 语音相比,所提出模型增强后的语音在宽带 PESQ 上平均提高了 1.870 分。

相似文献

1
A lightweight speech enhancement network fusing bone- and air-conducted speech.融合骨导和气导语音的轻量级语音增强网络
J Acoust Soc Am. 2024 Aug 1;156(2):1355-1366. doi: 10.1121/10.0028339.
2
Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross- correlation bands.基于香农熵和骨传导语音互相关带的软阈值降噪算法。
Technol Health Care. 2018;26(S1):281-289. doi: 10.3233/THC-174615.
3
Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement.融合骨传导与空气传导传感器用于复域语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:3134-3143. doi: 10.1109/taslp.2022.3209943. Epub 2022 Sep 26.
4
ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN.复杂域中基于注意力的骨传导和声传导语音增强融合
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7757-7761. doi: 10.1109/icassp43922.2022.9746374. Epub 2022 Apr 27.
5
A Real-Time Dual-Microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor.骨传导传感器辅助的实时双麦克风语音增强算法。
Sensors (Basel). 2020 Sep 5;20(18):5050. doi: 10.3390/s20185050.
6
Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.基于动态权重损失和注意力编解码器循环神经网络的因果语音增强。
PLoS One. 2023 May 11;18(5):e0285629. doi: 10.1371/journal.pone.0285629. eCollection 2023.
7
Model-based speech enhancement using a bone-conducted signal.基于模型的骨导信号语音增强。
J Acoust Soc Am. 2012 Mar;131(3):EL262-7. doi: 10.1121/1.3687014.
8
Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。
PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.
9
A method for enhancing speech and warning signals based on parallel convolutional neural networks in a noisy environment.基于噪声环境下并行卷积神经网络的语音增强和预警信号方法。
Technol Health Care. 2021;29(S1):141-152. doi: 10.3233/THC-218015.
10
A Robust Dual-Microphone Generalized Sidelobe Canceller Using a Bone-Conduction Sensor for Speech Enhancement.使用骨传导传感器的稳健双麦克风广义旁瓣对消器用于语音增强。
Sensors (Basel). 2021 Mar 8;21(5):1878. doi: 10.3390/s21051878.