Suppr超能文献

基于生物启发式时频表示和卷积神经网络的语音命令识别

Voice Command Recognition Using Biologically Inspired Time-Frequency Representation and Convolutional Neural Networks.

作者信息

Sharan Roneel V, Berkovsky Shlomo, Liu Sidong

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:998-1001. doi: 10.1109/EMBC44109.2020.9176006.

Abstract

Voice command is an important interface between human and technology in healthcare, such as for hands-free control of surgical robots and in patient care technology. Voice command recognition can be cast as a speech classification task, where convolutional neural networks (CNNs) have demonstrated strong performance. CNN is originally an image classification technique and time-frequency representation of speech signals is the most commonly used image-like representation for CNNs. Various types of time-frequency representations are commonly used for this purpose. This work investigates the use of cochleagram, utilizing a gammatone filter which models the frequency selectivity of the human cochlea, as the time-frequency representation of voice commands and input for the CNN classifier. We also explore multi-view CNN as a technique for combining learning from different time-frequency representations. The proposed method is evaluated on a large dataset and shown to achieve high classification accuracy.

摘要

语音指令是医疗保健领域中人与技术之间的重要接口,例如用于免提控制手术机器人以及患者护理技术。语音指令识别可以被视为一项语音分类任务,卷积神经网络(CNN)在该任务中已展现出强大的性能。CNN最初是一种图像分类技术,语音信号的时频表示是CNN最常用的类似图像的表示形式。为此通常会使用各种类型的时频表示。这项工作研究了使用耳蜗图,利用模拟人类耳蜗频率选择性的伽马通滤波器,作为语音指令的时频表示以及CNN分类器的输入。我们还探索了多视图CNN,作为一种结合来自不同时频表示学习的技术。所提出的方法在一个大型数据集上进行了评估,并显示出具有很高的分类准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验