• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多尺度深度时间卷积的两级流关键字检测与定位。

Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution.

机构信息

Audio, Speech and Language Processing Group (ASLP@NPU), ASGO, School of Computer Science, Northwestern Polytechnical University, Xi'an, China.

China Mobile Research Institute, China.

出版信息

Neural Netw. 2022 Jun;150:28-42. doi: 10.1016/j.neunet.2022.03.003. Epub 2022 Mar 10.

DOI:10.1016/j.neunet.2022.03.003
PMID:35303660
Abstract

A keyword spotting (KWS) system running on smart devices should accurately detect the appearances and predict the locations of predefined keywords from audio streams, with small footprint and high efficiency. To this end, this paper proposes a new two-stage KWS method which combines a novel multi-scale depthwise temporal convolution (MDTC) feature extractor and a two-stage keyword detection and localization module. The MDTC feature extractor learns multi-scale feature representation efficiently with dilated depthwise temporal convolution, modeling both the temporal context and the speech rate variation. We use a region proposal network (RPN) as the first-stage KWS. At each frame, we design multiple time regions, which all take the current frame as the end position but have different start positions. These time regions (or formally anchors) are used to indicate rough location candidates of keyword. With frame level features from the MDTC feature extractor as inputs, RPN learns to propose keyword region proposals based on the designed anchors. To alleviate the keyword/non-keyword class imbalance problem, we specifically introduce a hard example mining algorithm to select effective negative anchors in RPN training. The keyword region proposals from the first-stage RPN contain keyword location information which is subsequently used to explicitly extract keyword related sequential features to train the second-stage KWS. The second-stage system learns to classify and transform region proposal to keyword IDs and ground-truth keyword region respectively. Experiments on the Google Speech Command dataset show that the proposed MDTC feature extractor surpasses several competitive feature extractors with a new state-of-the-art command classification error rate of 1.74%. With the MDTC feature extractor, we further conduct wake-up word (WuW) detection and localization experiments on a commercial WuW dataset. Compared to a strong baseline, our proposed two-stage method achieves relatively 27-32% better false rejection rate at one false alarm per hour, while for keyword localization, the two-stage approach achieves more than 0.95 mean intersection-over-union ratio, which is clearly better than the one-stage RPN method.

摘要

一个运行在智能设备上的关键词发现(KWS)系统应该能够准确地从音频流中检测到预定义关键词的出现并预测其位置,同时具有较小的占用空间和较高的效率。为此,本文提出了一种新的两阶段 KWS 方法,该方法结合了一种新颖的多尺度深度时间卷积(MDTC)特征提取器和一个两阶段关键词检测和定位模块。MDTC 特征提取器通过空洞深度时间卷积高效地学习多尺度特征表示,对时间上下文和语音速率变化进行建模。我们使用区域建议网络(RPN)作为第一阶段 KWS。在每一帧,我们设计了多个时间区域,每个区域都以当前帧为结束位置,但起始位置不同。这些时间区域(或正式的锚点)用于指示关键词的大致位置候选。RPN 使用来自 MDTC 特征提取器的帧级特征作为输入,根据设计的锚点学习提出关键词区域建议。为了缓解关键词/非关键词的类别不平衡问题,我们特别引入了一种硬例挖掘算法,在 RPN 训练中选择有效的负锚点。第一阶段 RPN 提出的关键词区域建议包含关键词位置信息,随后用于显式提取关键词相关的顺序特征,以训练第二阶段 KWS。第二阶段系统学习对区域建议进行分类和转换,分别得到关键词 ID 和关键词真实区域。在 Google Speech Command 数据集上的实验表明,所提出的 MDTC 特征提取器在命令分类错误率上达到了新的最先进水平 1.74%,优于几个有竞争力的特征提取器。使用 MDTC 特征提取器,我们还在一个商业的唤醒词(WuW)数据集上进行了唤醒词检测和定位实验。与一个强大的基线相比,我们提出的两阶段方法在每小时一个误报的情况下,假拒率相对降低了 27-32%,而对于关键词定位,两阶段方法的平均交并比(IoU)超过 0.95,明显优于单阶段 RPN 方法。

相似文献

1
Two-stage streaming keyword detection and localization with multi-scale depthwise temporal convolution.基于多尺度深度时间卷积的两级流关键字检测与定位。
Neural Netw. 2022 Jun;150:28-42. doi: 10.1016/j.neunet.2022.03.003. Epub 2022 Mar 10.
2
FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks.使用深度可分离二值化和三值化神经网络的关键词识别系统的现场可编程门阵列实现
Sensors (Basel). 2023 Jun 19;23(12):5701. doi: 10.3390/s23125701.
3
Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting.基于 Hough 变换的角度特征用于无学习的手写关键词定位。
Sensors (Basel). 2021 Jul 7;21(14):4648. doi: 10.3390/s21144648.
4
Decoding imagined speech from EEG signals using hybrid-scale spatial-temporal dilated convolution network.利用混合尺度时空扩张卷积网络从 EEG 信号中解码想象中的语音。
J Neural Eng. 2021 Aug 11;18(4). doi: 10.1088/1741-2552/ac13c0.
5
Keyword Spotting Using Human Electrocorticographic Recordings.利用人类皮层脑电图记录进行关键词识别
Front Neurosci. 2019 Feb 19;13:60. doi: 10.3389/fnins.2019.00060. eCollection 2019.
6
Attention Based Convolutional Neural Network with Multi-frequency Resolution Feature for Environment Sound Classification.基于注意力机制的具有多频率分辨率特征的卷积神经网络用于环境声音分类
Neural Process Lett. 2022 Oct 24:1-16. doi: 10.1007/s11063-022-11041-y.
7
End-to-end keyword search system based on attention mechanism and energy scorer for low resource languages.基于注意力机制和能量得分器的针对低资源语言的端到端关键词搜索系统。
Neural Netw. 2021 Jul;139:326-334. doi: 10.1016/j.neunet.2021.04.002. Epub 2021 Apr 10.
8
SC-RPN: A Strong Correlation Learning Framework for Region Proposal.SC-RPN:一种用于区域提议的强相关学习框架。
IEEE Trans Image Process. 2021;30:4084-4098. doi: 10.1109/TIP.2021.3069547. Epub 2021 Apr 8.
9
ACG-EmoCluster: A Novel Framework to Capture Spatial and Temporal Information from Emotional Speech Enhanced by DeepCluster.ACG-EmoCluster:一种从 DeepCluster 增强的情感语音中捕获空间和时间信息的新框架。
Sensors (Basel). 2023 May 16;23(10):4777. doi: 10.3390/s23104777.
10
Incremental RPN: Hierarchical Region Proposal Network for Apple Leaf Disease Detection in Natural Environments.增量区域提议网络:用于自然环境中苹果叶病害检测的分层区域提议网络
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):2418-2431. doi: 10.1109/TCBB.2024.3469178. Epub 2024 Dec 10.

引用本文的文献

1
A Review of Voice-Based Pain Detection in Adults Using Artificial Intelligence.基于人工智能的成人语音疼痛检测综述
Bioengineering (Basel). 2023 Apr 21;10(4):500. doi: 10.3390/bioengineering10040500.