• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

交叉混合卷积神经网络在数字语音识别中的应用。

Crossmixed convolutional neural network for digital speech recognition.

机构信息

Faculty of Mechanical - Electrical and Computer Engineering, Van Lang University, Ho Chi Minh City, Vietnam.

Faculty of Information Technology, University of Finance-Marketing, Ho Chi Minh City, Vietnam.

出版信息

PLoS One. 2024 Apr 26;19(4):e0302394. doi: 10.1371/journal.pone.0302394. eCollection 2024.

DOI:10.1371/journal.pone.0302394
PMID:38669233
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11051591/
Abstract

Digital speech recognition is a challenging problem that requires the ability to learn complex signal characteristics such as frequency, pitch, intensity, timbre, and melody, which traditional methods often face issues in recognizing. This article introduces three solutions based on convolutional neural networks (CNN) to solve the problem: 1D-CNN is designed to learn directly from digital data; 2DS-CNN and 2DM-CNN have a more complex architecture, transferring raw waveform into transformed images using Fourier transform to learn essential features. Experimental results on four large data sets, containing 30,000 samples for each, show that the three proposed models achieve superior performance compared to well-known models such as GoogLeNet and AlexNet, with the best accuracy of 95.87%, 99.65%, and 99.76%, respectively. With 5-10% higher performance than other models, the proposed solution has demonstrated the ability to effectively learn features, improve recognition accuracy and speed, and open up the potential for broad applications in virtual assistants, medical recording, and voice commands.

摘要

数字语音识别是一个具有挑战性的问题,需要能够学习复杂的信号特征,如频率、音高、强度、音色和旋律,而传统方法在识别这些特征时常常存在问题。本文介绍了三种基于卷积神经网络(CNN)的解决方案来解决这个问题:1D-CNN 旨在直接从数字数据中学习;2DS-CNN 和 2DM-CNN 具有更复杂的架构,使用傅里叶变换将原始波形转换为变换后的图像,以学习基本特征。在四个包含 30000 个样本的大型数据集上的实验结果表明,与 GoogLeNet 和 AlexNet 等知名模型相比,所提出的三种模型具有更好的性能,最佳准确率分别为 95.87%、99.65%和 99.76%。与其他模型相比,性能提高了 5-10%,所提出的解决方案展示了有效学习特征、提高识别准确性和速度的能力,并为虚拟助手、医疗记录和语音命令等广泛应用开辟了潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/15659a1f8397/pone.0302394.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/bf90001f5ed9/pone.0302394.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/63a282a8b949/pone.0302394.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/b57d1cfef86f/pone.0302394.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/174b5e38633d/pone.0302394.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/1dc54a411fcd/pone.0302394.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/ea755636fb15/pone.0302394.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/d17df773afda/pone.0302394.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/e098332d2ab7/pone.0302394.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/fc7e828a13c9/pone.0302394.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/3b3ac49c9a5c/pone.0302394.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/15659a1f8397/pone.0302394.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/bf90001f5ed9/pone.0302394.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/63a282a8b949/pone.0302394.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/b57d1cfef86f/pone.0302394.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/174b5e38633d/pone.0302394.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/1dc54a411fcd/pone.0302394.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/ea755636fb15/pone.0302394.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/d17df773afda/pone.0302394.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/e098332d2ab7/pone.0302394.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/fc7e828a13c9/pone.0302394.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/3b3ac49c9a5c/pone.0302394.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75ab/11051591/15659a1f8397/pone.0302394.g011.jpg

相似文献

1
Crossmixed convolutional neural network for digital speech recognition.交叉混合卷积神经网络在数字语音识别中的应用。
PLoS One. 2024 Apr 26;19(4):e0302394. doi: 10.1371/journal.pone.0302394. eCollection 2024.
2
White blood cells detection and classification based on regional convolutional neural networks.基于区域卷积神经网络的白细胞检测与分类。
Med Hypotheses. 2020 Feb;135:109472. doi: 10.1016/j.mehy.2019.109472. Epub 2019 Nov 4.
3
Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.基于生物启发式时频表示和卷积神经网络的语音命令识别
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:998-1001. doi: 10.1109/EMBC44109.2020.9176006.
4
Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms.基于语音声谱图的卷积神经网络与特别设计的多注意力模块的年龄与性别识别
Sensors (Basel). 2021 Sep 1;21(17):5892. doi: 10.3390/s21175892.
5
Driver Fatigue Detection Based on Convolutional Neural Networks Using EM-CNN.基于使用EM-CNN的卷积神经网络的驾驶员疲劳检测
Comput Intell Neurosci. 2020 Nov 18;2020:7251280. doi: 10.1155/2020/7251280. eCollection 2020.
6
A Classroom Emotion Recognition Model Based on a Convolutional Neural Network Speech Emotion Algorithm.基于卷积神经网络语音情感算法的课堂情感识别模型。
Occup Ther Int. 2022 Jul 7;2022:9563877. doi: 10.1155/2022/9563877. eCollection 2022.
7
A Combined CNN Architecture for Speech Emotion Recognition.一种用于语音情感识别的 CNN 架构组合。
Sensors (Basel). 2024 Sep 6;24(17):5797. doi: 10.3390/s24175797.
8
Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.深度网络:基于深度学习频率特征的轻量级 CNN 语音情感识别系统
Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.
9
Classification of Acoustic Influences Registered with Phase-Sensitive OTDR Using Pattern Recognition Methods.基于模式识别方法的相位敏感光时域反射计记录的声影响分类。
Sensors (Basel). 2023 Jan 4;23(2):582. doi: 10.3390/s23020582.
10
Specific Radar Recognition Based on Characteristics of Emitted Radio Waveforms Using Convolutional Neural Networks.基于卷积神经网络的发射无线电波特征的特定雷达识别。
Sensors (Basel). 2021 Dec 9;21(24):8237. doi: 10.3390/s21248237.

引用本文的文献

1
Classification of land lot shapes in real estate sector using a convolutional neural network.利用卷积神经网络对房地产领域的土地形状进行分类。
PLoS One. 2024 Sep 19;19(9):e0308788. doi: 10.1371/journal.pone.0308788. eCollection 2024.

本文引用的文献

1
Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。
PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.
2
Many but not all deep neural network audio models capture brain responses and exhibit correspondence between model stages and brain regions.许多(但不是全部)深度神经网络音频模型可以捕捉大脑反应,并在模型阶段和大脑区域之间表现出对应关系。
PLoS Biol. 2023 Dec 13;21(12):e3002366. doi: 10.1371/journal.pbio.3002366. eCollection 2023 Dec.
3
Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.
基于机器学习技术的语音情感识别:卷积神经网络和随机森林的特征提取与比较。
PLoS One. 2023 Nov 21;18(11):e0291500. doi: 10.1371/journal.pone.0291500. eCollection 2023.
4
A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.卷积神经网络通过听觉皮层中的神经元群体为自然声音编码提供了一个可推广的模型。
PLoS Comput Biol. 2023 May 5;19(5):e1011110. doi: 10.1371/journal.pcbi.1011110. eCollection 2023 May.
5
Signal-to-signal neural networks for improved spike estimation from calcium imaging data.基于信号-信号神经网络的钙成像数据尖峰事件提取方法
PLoS Comput Biol. 2021 Mar 1;17(3):e1007921. doi: 10.1371/journal.pcbi.1007921. eCollection 2021 Mar.
6
Brain-optimized extraction of complex sound features that drive continuous auditory perception.大脑优化提取驱动连续听觉感知的复杂声音特征。
PLoS Comput Biol. 2020 Jul 2;16(7):e1007992. doi: 10.1371/journal.pcbi.1007992. eCollection 2020 Jul.
7
Towards the classification of heart sounds based on convolutional deep neural network.基于卷积深度神经网络的心音分类研究
Health Inf Sci Syst. 2019 Aug 7;7(1):16. doi: 10.1007/s13755-019-0078-0. eCollection 2019 Dec.