• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

识别信息与传递者:用于可靠语音和说话人识别的仿生光谱分析

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.

作者信息

Nemala Sridhar Krishna, Patil Kailash, Elhilali Mounya

机构信息

Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, 3400 N Charles Street, Barton Hall, Rm 105, Baltimore, MD USA.

出版信息

Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.

DOI:10.1007/s10772-012-9184-y
PMID:26412979
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4579853/
Abstract

Humans are quite adept at communicating in presence of noise. However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions. The proposed work explores the use of a biologically-motivated multi-resolution spectral analysis for speech representation. This approach focuses on the information-rich spectral attributes of speech and presents an intricate yet computationally-efficient analysis of the speech signal by careful choice of model parameters. Further, the approach takes advantage of an information-theoretic analysis of the message and speaker dominant regions in the speech signal, and defines feature representations to address two diverse tasks such as speech and speaker recognition. The proposed analysis surpasses the standard Mel-Frequency Cepstral Coefficients (MFCC), and its enhanced variants (via mean subtraction, variance normalization and time sequence filtering) and yields significant improvements over a state-of-the-art noise robust feature scheme, on both speech and speaker recognition tasks.

摘要

人类在有噪声的情况下相当擅长交流。然而,大多数语音处理系统,如自动语音和说话人识别系统,当语音信号被未知的背景失真干扰时,性能会显著下降。所提出的工作探索了使用一种受生物启发的多分辨率频谱分析来进行语音表示。这种方法关注语音中信息丰富的频谱属性,并通过精心选择模型参数,对语音信号进行复杂但计算高效的分析。此外,该方法利用了对语音信号中消息和说话人主导区域的信息理论分析,并定义特征表示以解决语音和说话人识别等两个不同的任务。所提出的分析超越了标准的梅尔频率倒谱系数(MFCC)及其增强变体(通过均值减法、方差归一化和时间序列滤波),并且在语音和说话人识别任务上,相对于一种先进的抗噪声特征方案都有显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/65d073c46e87/10772_2012_9184_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/21cba057366a/10772_2012_9184_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/40ea2d2f9b76/10772_2012_9184_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/65d073c46e87/10772_2012_9184_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/21cba057366a/10772_2012_9184_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/40ea2d2f9b76/10772_2012_9184_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd98/4579853/65d073c46e87/10772_2012_9184_Fig3_HTML.jpg

相似文献

1
Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.识别信息与传递者:用于可靠语音和说话人识别的仿生光谱分析
Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.
2
Biomimetic multi-resolution analysis for robust speaker recognition.用于稳健说话人识别的仿生多分辨率分析
EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-22. Epub 2012 Sep 7.
3
Toward Realigning Automatic Speaker Verification in the Era of COVID-19.面向新冠疫情时代的自动说话人验证技术的再调整。
Sensors (Basel). 2022 Mar 30;22(7):2638. doi: 10.3390/s22072638.
4
Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition.基于时频掩蔽的语音倒谱表示:在语音识别中的应用。
J Acoust Soc Am. 1996 Jul;100(1):603-14. doi: 10.1121/1.415961.
5
A bio-inspired feature extraction for robust speech recognition.一种用于稳健语音识别的受生物启发的特征提取方法。
Springerplus. 2014 Nov 4;3:651. doi: 10.1186/2193-1801-3-651. eCollection 2014.
6
A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.一种基于带通调制滤波的多流特征框架用于鲁棒语音识别。
IEEE Trans Audio Speech Lang Process. 2013 Feb;21(2):416-426. doi: 10.1109/TASL.2012.2219526. Epub 2012 Sep 18.
7
Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.基于深度神经网络的语音识别的多分辨率语音分析:在 TIMIT 上的实验。
PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.
8
A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.基于异质分类器融合与互补特征协作的两级说话人识别系统。
Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.
9
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
10
Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。
Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.

引用本文的文献

1
Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.用于稳健语音活动检测的反馈驱动感官映射自适应
IEEE/ACM Trans Audio Speech Lang Process. 2017 Mar;25(3):481-492. doi: 10.1109/TASLP.2016.2639322. Epub 2016 Dec 13.

本文引用的文献

1
Neural Network Classifiers Estimate Bayesian Probabilities.神经网络分类器估计贝叶斯概率。
Neural Comput. 1991 Winter;3(4):461-483. doi: 10.1162/neco.1991.3.4.461.
2
Biomimetic multi-resolution analysis for robust speaker recognition.用于稳健说话人识别的仿生多分辨率分析
EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-22. Epub 2012 Sep 7.
3
A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise.听觉传出抑制的频率选择反馈模型及其对噪声中语音识别的意义。
J Acoust Soc Am. 2012 Sep;132(3):1535-41. doi: 10.1121/1.4742745.
4
The modulation transfer function for speech intelligibility.语音清晰度的调制传递函数。
PLoS Comput Biol. 2009 Mar;5(3):e1000302. doi: 10.1371/journal.pcbi.1000302. Epub 2009 Mar 6.
5
Robust combination of neural networks and hidden Markov models for speech recognition.用于语音识别的神经网络与隐马尔可夫模型的稳健组合。
IEEE Trans Neural Netw. 2003;14(6):1519-31. doi: 10.1109/TNN.2003.820838.
6
Neural mechanisms for spectral analysis in the auditory midbrain, thalamus, and cortex.听觉中脑、丘脑和皮层中频谱分析的神经机制。
Int Rev Neurobiol. 2005;70:207-52. doi: 10.1016/S0074-7742(05)70007-6.
7
Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex.lemniscal听觉丘脑和皮层中的频谱时间感受野。
J Neurophysiol. 2002 Jan;87(1):516-27. doi: 10.1152/jn.00395.2001.