Suppr超能文献

基于不同深度学习架构的骨导语音信号的区域语言语音识别。

Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures.

机构信息

Department of Electronics and Communication Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, India.

Center of Excellence for Bioprocess and Biotechnology, Department of Chemical Engineering, College of Biological and Chemical Engineering, Addis Ababa Science and Technology University, Addis Ababa, Ethiopia.

出版信息

Comput Intell Neurosci. 2022 Aug 25;2022:4473952. doi: 10.1155/2022/4473952. eCollection 2022.

Abstract

Bone-conducted microphone (BCM) senses vibrations from bones in the skull during speech to electrical audio signal. When transmitting speech signals, bone-conduction microphones (BCMs) capture speech signals based on the vibrations of the speaker's skull and have better noise-resistance capabilities than standard air-conduction microphones (ACMs). BCMs have a different frequency response than ACMs because they only capture the low-frequency portion of speech signals. When we replace an ACM with a BCM, we may get satisfactory noise suppression results, but the speech quality and intelligibility may suffer due to the nature of the solid vibration. Mismatched BCM and ACM characteristics can also have an impact on ASR performance, and it is impossible to recreate a new ASR system using voice data from BCMs. The speech intelligibility of a BCM-conducted speech signal is determined by the location of the bone used to acquire the signal and accurately model phonemes of words. Deep learning techniques such as neural network have traditionally been used for speech recognition. However, neural networks have a high computational cost and are unable to model phonemes in signals. In this paper, the intelligibility of BCM signal speech was evaluated for different bone locations, namely the right ramus, larynx, and right mastoid. Listener and deep learning architectures such as CapsuleNet, UNet, and S-Net were used to acquire the BCM signal for Tamil words and evaluate speech intelligibility. As validated by the listener and deep learning architectures, the Larynx bone location improves speech intelligibility.

摘要

骨导麦克风 (BCM) 在说话时通过颅骨中的振动来感应电音频信号。在传输语音信号时,骨导麦克风 (BCM) 根据说话者颅骨的振动来捕获语音信号,并且比标准气导麦克风 (ACM) 具有更好的抗噪能力。BCM 的频率响应与 ACM 不同,因为它们只捕获语音信号的低频部分。当我们用 BCM 替换 ACM 时,我们可能会获得令人满意的降噪效果,但由于固体振动的性质,语音质量和可懂度可能会受到影响。BCM 和 ACM 特性不匹配也会对 ASR 性能产生影响,并且不可能使用 BCM 中的语音数据重新创建新的 ASR 系统。BCM 传导语音信号的可懂度取决于获取信号所使用的骨骼位置,并准确地对单词的音位进行建模。传统上,深度学习技术,如神经网络,已被用于语音识别。然而,神经网络的计算成本很高,并且无法对信号中的音位进行建模。在本文中,评估了不同骨骼位置(即右下颌骨、喉部和右乳突)的 BCM 信号语音的可懂度。使用了语音识别的听者和深度学习架构,如 CapsuleNet、UNet 和 S-Net,来获取泰米尔语单词的 BCM 信号并评估语音可懂度。通过听者和深度学习架构验证,喉部骨骼位置提高了语音可懂度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ce9/9436543/220b291157a3/CIN2022-4473952.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验