基于双通道变换的网络，具有均衡生成组件预测，用于在时域中对柔性振动传感器语音进行增强。

Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain.

机构信息

High-tech Institute, Fan Gong-ting South Street on the 12th, Weifang 261000, China.

Command and Control Engineering College, Army Engineering University, Nanjing 210007, China.

出版信息

J Acoust Soc Am. 2022 May;151(5):2814. doi: 10.1121/10.0010316.

DOI:10.1121/10.0010316

PMID:35649897

Abstract

The flexible vibrational sensor (FVS) has the potential to become a popular wearable communication device because of its natural noise shielding characteristics and soft materials. However, FVS speech faces a severe loss of frequency components. To improve speech quality, a time-domain neural network model based on the dual-path transformer combined with equalization-generation components prediction (DPT-EGNet) is proposed. More specifically, the DPT-EGNet consists of five modules, namely the pre-processing module, dual-path transformer module, equalization module, generation module, and post-processing module. The dual-path transformer module is leveraged to extract the local and global contextual relationship of long-term speech sequences, which is extremely beneficial for inferring the missing components. The equalization and generation modules are designed according to the characteristics of FVS speech, which further improve the speech quality by simulating the inversion process of the speech distortion. The experimental results demonstrate that the proposed model effectively improves the quality of FVS speech; the average perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and composite measure for overall speech quality (COVL) scores of three males and three females are relatively increased by 64.19%, 29.63%, and 101.37%, which is superior to other baseline models developed in different domains. The proposed model also has significantly lower complexity than the others.

摘要

柔性振动传感器 (FVS) 具有成为流行的可穿戴通信设备的潜力，因为它具有天然的噪声屏蔽特性和柔软的材料。然而，FVS 语音面临着严重的频率分量损失。为了提高语音质量，提出了一种基于双路径变换器结合均衡-生成组件预测 (DPT-EGNet) 的时域神经网络模型。更具体地说，DPT-EGNet 由五个模块组成，即预处理模块、双路径变换器模块、均衡模块、生成模块和后处理模块。双路径变换器模块用于提取长期语音序列的局部和全局上下文关系，这对于推断缺失分量非常有利。均衡和生成模块是根据 FVS 语音的特点设计的，通过模拟语音失真的反过程，进一步提高了语音质量。实验结果表明，所提出的模型有效地提高了 FVS 语音的质量；三个男性和三个女性的平均语音质量感知评估 (PESQ)、短期客观可懂度 (STOI) 和整体语音质量综合测量 (COVL) 得分分别相对提高了 64.19%、29.63%和 101.37%，优于其他在不同领域开发的基线模型。与其他模型相比，所提出的模型的复杂度也显著降低。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于双通道变换的网络，具有均衡生成组件预测，用于在时域中对柔性振动传感器语音进行增强。

Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain.

机构信息

出版信息

相似文献

引用本文的文献

基于双通道变换的网络，具有均衡生成组件预测，用于在时域中对柔性振动传感器语音进行增强。

Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain.

机构信息

出版信息

相似文献

引用本文的文献