基于生物启发式时频表示和卷积神经网络的语音命令识别

Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.

作者信息

Sharan Roneel V, Berkovsky Shlomo, Liu Sidong

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:998-1001. doi: 10.1109/EMBC44109.2020.9176006.

DOI:10.1109/EMBC44109.2020.9176006

Abstract

Voice command is an important interface between human and technology in healthcare, such as for hands-free control of surgical robots and in patient care technology. Voice command recognition can be cast as a speech classification task, where convolutional neural networks (CNNs) have demonstrated strong performance. CNN is originally an image classification technique and time-frequency representation of speech signals is the most commonly used image-like representation for CNNs. Various types of time-frequency representations are commonly used for this purpose. This work investigates the use of cochleagram, utilizing a gammatone filter which models the frequency selectivity of the human cochlea, as the time-frequency representation of voice commands and input for the CNN classifier. We also explore multi-view CNN as a technique for combining learning from different time-frequency representations. The proposed method is evaluated on a large dataset and shown to achieve high classification accuracy.

摘要

语音指令是医疗保健领域中人与技术之间的重要接口，例如用于免提控制手术机器人以及患者护理技术。语音指令识别可以被视为一项语音分类任务，卷积神经网络（CNN）在该任务中已展现出强大的性能。CNN最初是一种图像分类技术，语音信号的时频表示是CNN最常用的类似图像的表示形式。为此通常会使用各种类型的时频表示。这项工作研究了使用耳蜗图，利用模拟人类耳蜗频率选择性的伽马通滤波器，作为语音指令的时频表示以及CNN分类器的输入。我们还探索了多视图CNN，作为一种结合来自不同时频表示学习的技术。所提出的方法在一个大型数据集上进行了评估，并显示出具有很高的分类准确率。

相似文献

Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:998-1001. doi: 10.1109/EMBC44109.2020.9176006.

Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms.

Sensors (Basel). 2021 Sep 1;21(17):5892. doi: 10.3390/s21175892.

Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks.

Sensors (Basel). 2021 May 14;21(10):3434. doi: 10.3390/s21103434.

Orthogonal convolutional neural networks for automatic sleep stage classification based on single-channel EEG.

Comput Methods Programs Biomed. 2020 Jan;183:105089. doi: 10.1016/j.cmpb.2019.105089. Epub 2019 Sep 27.

Voice pathology detection using optimized convolutional neural networks and explainable artificial intelligence-based analysis.

Comput Methods Biomech Biomed Engin. 2024 Nov;27(14):2041-2057. doi: 10.1080/10255842.2023.2270102. Epub 2023 Oct 18.

Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions.

Sensors (Basel). 2021 Jun 27;21(13):4399. doi: 10.3390/s21134399.

Convolutional Neural Networks for Pathological Voice Detection.

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:1-4. doi: 10.1109/EMBC.2018.8513222.

Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.

Biomed Tech (Berl). 2021 Nov 29;66(6):613-625. doi: 10.1515/bmt-2021-0112. Print 2021 Dec 20.

DNN Filter Bank Improves 1-Max Pooling CNN for Single-Channel EEG Automatic Sleep Stage Classification.

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:453-456. doi: 10.1109/EMBC.2018.8512286.

Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks.

Sensors (Basel). 2020 Jul 5;20(13):3768. doi: 10.3390/s20133768.

引用本文的文献

Spike encoding techniques for IoT time-varying signals benchmarked on a neuromorphic classification task.

Front Neurosci. 2022 Dec 21;16:999029. doi: 10.3389/fnins.2022.999029. eCollection 2022.

Benchmarking Audio Signal Representation Techniques for Classification with Convolutional Neural Networks.

Sensors (Basel). 2021 May 14;21(10):3434. doi: 10.3390/s21103434.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于生物启发式时频表示和卷积神经网络的语音命令识别

Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献