Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea.
Sci Rep. 2023 Apr 19;13(1):6414. doi: 10.1038/s41598-023-33755-2.
In this study, we present initial efforts for a new speech recognition approach aimed at producing different input images for convolutional neural network (CNN)-based speech recognition. We explored the potential of the tympanic membrane (eardrum)-inspired viscoelastic membrane-type diaphragms to deliver audio visualization images using a cross-recurrence plot (CRP). These images were formed by the two phase-shifted vibration responses of viscoelastic diaphragms. We expect this technique to replace the fast Fourier transform (FFT) spectrum currently used for speech recognition. Herein, we report that the new creation method of color images enabled by combining two phase-shifted vibration responses of viscoelastic diaphragms with CRP shows a lower computation burden and a promising potential alternative way to STFT (conventional spectrogram) when the image resolution (pixel size) is below critical resolution.
在这项研究中,我们提出了一种新的语音识别方法的初步努力,旨在为基于卷积神经网络 (CNN) 的语音识别生成不同的输入图像。我们探索了鼓膜(耳鼓)启发的粘弹性膜式膜片的潜力,以使用交叉递归图 (CRP) 传递音频可视化图像。这些图像是通过粘弹性膜的两个相移振动响应形成的。我们希望这项技术能够替代目前用于语音识别的快速傅里叶变换 (FFT) 频谱。在此,我们报告了一种新的彩色图像创建方法,该方法通过结合粘弹性膜的两个相移振动响应和 CRP 来实现,当图像分辨率(像素大小)低于临界分辨率时,与 STFT(常规频谱图)相比,该方法具有更低的计算负担和更有前途的替代方法。