• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.一种基于带通调制滤波的多流特征框架用于鲁棒语音识别。
IEEE Trans Audio Speech Lang Process. 2013 Feb;21(2):416-426. doi: 10.1109/TASL.2012.2219526. Epub 2012 Sep 18.
2
Toward optimizing stream fusion in multistream recognition of speech.针对语音多流识别中的流融合优化。
J Acoust Soc Am. 2011 Jul;130(1):EL14-8. doi: 10.1121/1.3595744.
3
Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.识别信息与传递者:用于可靠语音和说话人识别的仿生光谱分析
Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.
4
Temporal envelope compensation for robust phoneme recognition using modulation spectrum.基于调制谱的鲁棒音素识别的时间包络补偿。
J Acoust Soc Am. 2010 Dec;128(6):3769-80. doi: 10.1121/1.3504658.
5
Biomimetic multi-resolution analysis for robust speaker recognition.用于稳健说话人识别的仿生多分辨率分析
EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-22. Epub 2012 Sep 7.
6
Spectro-Temporal Processing in a Two-Stream Computational Model of Auditory Cortex.听觉皮层双流计算模型中的光谱-时间处理
Front Comput Neurosci. 2020 Jan 22;13:95. doi: 10.3389/fncom.2019.00095. eCollection 2019.
7
Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing.听力受损者的超阈值失真与言语可懂度的听觉模型。
J Am Acad Audiol. 2013 Apr;24(4):307-28. doi: 10.3766/jaaa.24.4.6.
8
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.用于鲁棒自动语音识别的时频谱调制子空间扩展滤波器组特征。
J Acoust Soc Am. 2012 May;131(5):4134-51. doi: 10.1121/1.3699200.
9
A model of auditory perception as front end for automatic speech recognition.一种作为自动语音识别前端的听觉感知模型。
J Acoust Soc Am. 1999 Oct;106(4 Pt 1):2040-50. doi: 10.1121/1.427950.
10
Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering.基于频谱-时间调制滤波的用于噪声语音的听觉激励前端。
J Acoust Soc Am. 2014 Nov;136(5):EL343-9. doi: 10.1121/1.4896406.

引用本文的文献

1
A Gestalt inference model for auditory scene segregation.听觉场景分离的格式塔推理模型。
PLoS Comput Biol. 2019 Jan 22;15(1):e1006711. doi: 10.1371/journal.pcbi.1006711. eCollection 2019 Jan.
2
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.一种使用自适应听觉感受野的语音活动检测框架。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.
3
Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference.基于语音包络作为时间参考的脑启发式语音分割用于自动语音识别。
Sci Rep. 2016 Nov 23;6:37647. doi: 10.1038/srep37647.

本文引用的文献

1
Neural Network Classifiers Estimate Bayesian Probabilities.神经网络分类器估计贝叶斯概率。
Neural Comput. 1991 Winter;3(4):461-483. doi: 10.1162/neco.1991.3.4.461.
2
Toward optimizing stream fusion in multistream recognition of speech.针对语音多流识别中的流融合优化。
J Acoust Soc Am. 2011 Jul;130(1):EL14-8. doi: 10.1121/1.3595744.
3
Temporal envelope compensation for robust phoneme recognition using modulation spectrum.基于调制谱的鲁棒音素识别的时间包络补偿。
J Acoust Soc Am. 2010 Dec;128(6):3769-80. doi: 10.1121/1.3504658.
4
The modulation transfer function for speech intelligibility.语音清晰度的调制传递函数。
PLoS Comput Biol. 2009 Mar;5(3):e1000302. doi: 10.1371/journal.pcbi.1000302. Epub 2009 Mar 6.
5
The cortical organization of speech processing.言语处理的皮质组织。
Nat Rev Neurosci. 2007 May;8(5):393-402. doi: 10.1038/nrn2113. Epub 2007 Apr 13.
6
Multiresolution spectrotemporal analysis of complex sounds.复杂声音的多分辨率频谱-时间分析
J Acoust Soc Am. 2005 Aug;118(2):887-906. doi: 10.1121/1.1945807.
7
Speech recognition with amplitude and frequency modulations.具有幅度和频率调制的语音识别。
Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2293-8. doi: 10.1073/pnas.0406460102. Epub 2005 Jan 27.
8
Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers.模拟人工耳蜗处理对波动掩蔽器中语音接收的影响。
J Acoust Soc Am. 2003 Jul;114(1):446-54. doi: 10.1121/1.1579009.
9
Spectro-temporal modulation transfer functions and speech intelligibility.光谱-时间调制传递函数与言语清晰度
J Acoust Soc Am. 1999 Nov;106(5):2719-32. doi: 10.1121/1.428100.
10
Cortical processing of complex sounds.复杂声音的皮质处理
Curr Opin Neurobiol. 1998 Aug;8(4):516-21. doi: 10.1016/s0959-4388(98)80040-8.

一种基于带通调制滤波的多流特征框架用于鲁棒语音识别。

A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.

作者信息

Nemala Sridhar Krishna, Patil Kailash, Elhilali Mounya

机构信息

The authors are with the Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD 21218 USA.

出版信息

IEEE Trans Audio Speech Lang Process. 2013 Feb;21(2):416-426. doi: 10.1109/TASL.2012.2219526. Epub 2012 Sep 18.

DOI:10.1109/TASL.2012.2219526
PMID:29928166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6005699/
Abstract

There is strong neurophysiological evidence suggesting that processing of speech signals in the brain happens along parallel paths which encode complementary information in the signal. These parallel streams are organized around a duality of slow vs. fast: Coarse signal dynamics appear to be processed separately from rapidly changing modulations both in the spectral and temporal dimensions. We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation and channel distortions.

摘要

有强有力的神经生理学证据表明,大脑中语音信号的处理是沿着并行路径进行的,这些路径对信号中的互补信息进行编码。这些并行流围绕着慢与快的二元性组织起来:粗略的信号动态似乎与频谱和时间维度上快速变化的调制分别进行处理。我们在多流框架中采用这种二元性来实现强大的与说话者无关的音素识别。这里提出的方案以语音的多路径带通调制分析为中心,每个流覆盖整个时间和频谱调制范围。通过在频谱和时间维度上执行带通操作,所提出的方案避免了先前多流方法中经典的特征爆炸问题,同时保持了并行性和局部特征分析的优势。所提出的架构在音素识别方面比标准和最先进的特征方案有显著改进,特别是在存在非平稳噪声、混响和信道失真的情况下。