• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 CNN 的语音识别的鼓膜启发式软黏弹性膜片与音频可视化图像。

Eardrum-inspired soft viscoelastic diaphragms for CNN-based speech recognition with audio visualization images.

机构信息

Department of Mechanical Engineering, Inha University, 100 Inha-ro, Michuhol-gu, Incheon, 22212, Republic of Korea.

出版信息

Sci Rep. 2023 Apr 19;13(1):6414. doi: 10.1038/s41598-023-33755-2.

DOI:10.1038/s41598-023-33755-2
PMID:37076548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10115895/
Abstract

In this study, we present initial efforts for a new speech recognition approach aimed at producing different input images for convolutional neural network (CNN)-based speech recognition. We explored the potential of the tympanic membrane (eardrum)-inspired viscoelastic membrane-type diaphragms to deliver audio visualization images using a cross-recurrence plot (CRP). These images were formed by the two phase-shifted vibration responses of viscoelastic diaphragms. We expect this technique to replace the fast Fourier transform (FFT) spectrum currently used for speech recognition. Herein, we report that the new creation method of color images enabled by combining two phase-shifted vibration responses of viscoelastic diaphragms with CRP shows a lower computation burden and a promising potential alternative way to STFT (conventional spectrogram) when the image resolution (pixel size) is below critical resolution.

摘要

在这项研究中,我们提出了一种新的语音识别方法的初步努力,旨在为基于卷积神经网络 (CNN) 的语音识别生成不同的输入图像。我们探索了鼓膜(耳鼓)启发的粘弹性膜式膜片的潜力,以使用交叉递归图 (CRP) 传递音频可视化图像。这些图像是通过粘弹性膜的两个相移振动响应形成的。我们希望这项技术能够替代目前用于语音识别的快速傅里叶变换 (FFT) 频谱。在此,我们报告了一种新的彩色图像创建方法,该方法通过结合粘弹性膜的两个相移振动响应和 CRP 来实现,当图像分辨率(像素大小)低于临界分辨率时,与 STFT(常规频谱图)相比,该方法具有更低的计算负担和更有前途的替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/e3ca4411e054/41598_2023_33755_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/ae269846586a/41598_2023_33755_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/82fb029fcf06/41598_2023_33755_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/8c027cae2c02/41598_2023_33755_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/5fd488fba0af/41598_2023_33755_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/8d3fb2666006/41598_2023_33755_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/50950614bb15/41598_2023_33755_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/f4c00a74346e/41598_2023_33755_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/89a803818e1b/41598_2023_33755_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/e1f2249be341/41598_2023_33755_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/e3ca4411e054/41598_2023_33755_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/ae269846586a/41598_2023_33755_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/82fb029fcf06/41598_2023_33755_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/8c027cae2c02/41598_2023_33755_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/5fd488fba0af/41598_2023_33755_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/8d3fb2666006/41598_2023_33755_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/50950614bb15/41598_2023_33755_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/f4c00a74346e/41598_2023_33755_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/89a803818e1b/41598_2023_33755_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/e1f2249be341/41598_2023_33755_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d2/10115895/e3ca4411e054/41598_2023_33755_Fig10_HTML.jpg

相似文献

1
Eardrum-inspired soft viscoelastic diaphragms for CNN-based speech recognition with audio visualization images.基于 CNN 的语音识别的鼓膜启发式软黏弹性膜片与音频可视化图像。
Sci Rep. 2023 Apr 19;13(1):6414. doi: 10.1038/s41598-023-33755-2.
2
A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.一种用于语音情感识别的混合时间分布深度神经架构。
Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.
3
Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.用于噪声环境下语音情感识别的级联卷积神经网络架构
Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.
4
CNN-XGBoost fusion-based affective state recognition using EEG spectrogram image analysis.基于 CNN-XGBoost 融合的脑电频谱图图像分析情感状态识别。
Sci Rep. 2022 Aug 19;12(1):14122. doi: 10.1038/s41598-022-18257-x.
5
Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.基于多模态融合方法的语音信号和 EEG 信号的语音病理学检测与分类。
Biomed Tech (Berl). 2021 Nov 29;66(6):613-625. doi: 10.1515/bmt-2021-0112. Print 2021 Dec 20.
6
Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS.基于梅尔频谱图和 GeMAPS 的多输入语音情感识别模型。
Sensors (Basel). 2023 Feb 3;23(3):1743. doi: 10.3390/s23031743.
7
Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition.基于多个卷积神经网络的句子级唇读识别架构。
Sensors (Basel). 2021 Dec 23;22(1):72. doi: 10.3390/s22010072.
8
Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.基于生物启发式时频表示和卷积神经网络的语音命令识别
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:998-1001. doi: 10.1109/EMBC44109.2020.9176006.
9
Automatic detection of tympanic membrane and middle ear infection from oto-endoscopic images via convolutional neural networks.基于卷积神经网络的耳内镜图像中耳鼓膜和中耳感染的自动检测。
Neural Netw. 2020 Jun;126:384-394. doi: 10.1016/j.neunet.2020.03.023. Epub 2020 Apr 1.
10
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中,将二维并行卷积神经网络与自注意力空洞残差网络相结合。
Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

本文引用的文献

1
Flexible Piezoelectric Acoustic Sensors and Machine Learning for Speech Processing.用于语音处理的柔性压电声学传感器和机器学习。
Adv Mater. 2020 Sep;32(35):e1904020. doi: 10.1002/adma.201904020. Epub 2019 Oct 16.
2
An ultrathin conformable vibration-responsive electronic skin for quantitative vocal recognition.一种超轻薄、顺应性好的振动响应电子皮肤,用于定量声音识别。
Nat Commun. 2019 Jun 18;10(1):2468. doi: 10.1038/s41467-019-10465-w.
3
A Novel Frequency Selectivity Approach Based on Travelling Wave Propagation in Mechanoluminescence Basilar Membrane for Artificial Cochlea.
基于声发光基底膜中行波传播的新型频率选择方法用于人工耳蜗。
Sci Rep. 2018 Aug 13;8(1):12023. doi: 10.1038/s41598-018-30633-0.
4
Eardrum-inspired active sensors for self-powered cardiovascular system characterization and throat-attached anti-interference voice recognition.受鼓膜启发的主动传感器,用于自供电心血管系统特征分析和喉部附着抗干扰语音识别。
Adv Mater. 2015 Feb 25;27(8):1316-26. doi: 10.1002/adma.201404794. Epub 2015 Jan 12.
5
Musical interfaces: visualization and reconstruction of music with a microfluidic two-phase flow.音乐界面:利用微流体两相流对音乐进行可视化和重构
Sci Rep. 2014 Oct 20;4:6675. doi: 10.1038/srep06675.
6
Human voice recognition depends on language ability.人类的语音识别依赖于语言能力。
Science. 2011 Jul 29;333(6042):595. doi: 10.1126/science.1207327.
7
Multiscale recurrence quantification analysis of spatial cardiac vectorcardiogram signals.多尺度复发性定量分析空间心向量图信号。
IEEE Trans Biomed Eng. 2011 Feb;58(2):339-47. doi: 10.1109/TBME.2010.2063704. Epub 2010 Aug 5.
8
"Who" is saying "what"? Brain-based decoding of human voice and speech.是谁在说什么?基于大脑的人类语音和言语解码。
Science. 2008 Nov 7;322(5903):970-3. doi: 10.1126/science.1164318.
9
Viscoelastic properties of human tympanic membrane.人类鼓膜的粘弹性特性
Ann Biomed Eng. 2007 Feb;35(2):305-14. doi: 10.1007/s10439-006-9227-0. Epub 2006 Dec 8.
10
Self-cleaning surfaces--virtual realities.自清洁表面——虚拟现实。
Nat Mater. 2003 May;2(5):301-6. doi: 10.1038/nmat856.