Suppr超能文献

用于基于分类的基音估计的稳健谐波特征

Robust Harmonic Features for Classification-Based Pitch Estimation.

作者信息

Wang Dongmei, Yu Chengzhu, Hansen John H L

机构信息

CRSS-CILab: Cochlear Implant Processing Lab, Center for Robust Speech Systems, University of Texas at Dallas, Richardson, TX 75080 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):952-964. doi: 10.1109/TASLP.2017.2667879. Epub 2017 Feb 13.

Abstract

Pitch estimation in diverse naturalistic audio streams remains a challenge for speech processing and spoken language technology. In this study, we investigate the use of robust harmonic features for classification-based pitch estimation. The proposed pitch estimation algorithm is composed of two stages: pitch candidate generation and target pitch selection. Based on energy intensity and spectral envelope shape, five types of robust harmonic features are proposed to reflect pitch associated harmonic structure. A neural network is adopted for modeling the relationship between input harmonic features and output pitch salience for each specific pitch candidate. In the test stage, each pitch candidate is assessed with an output salience that indicates the potential as a true pitch value, based on its input feature vector processed through the neural network. Finally, according to the temporal continuity of pitch values, pitch contour tracking is performed using a hidden Markov model (HMM), and the Viterbi algorithm is used for HMM decoding. Experimental results show that the proposed algorithm outperforms several state-of-the-art pitch estimation methods in terms of accuracy in both high and low levels of additive noise.

摘要

在各种自然主义音频流中进行音高估计,仍然是语音处理和口语技术面临的一项挑战。在本研究中,我们探究了使用鲁棒谐波特征进行基于分类的音高估计。所提出的音高估计算法由两个阶段组成:音高候选生成和目标音高选择。基于能量强度和频谱包络形状,提出了五种鲁棒谐波特征,以反映与音高相关的谐波结构。采用神经网络对每个特定音高候选的输入谐波特征与输出音高显著性之间的关系进行建模。在测试阶段,根据通过神经网络处理的输入特征向量,为每个音高候选评估一个输出显著性,该显著性表示其作为真实音高值的可能性。最后,根据音高值的时间连续性,使用隐马尔可夫模型(HMM)进行音高轮廓跟踪,并使用维特比算法进行HMM解码。实验结果表明,在高、低水平加性噪声情况下,所提出的算法在准确性方面优于几种当前最先进的音高估计方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7412/6445256/2254a4543666/nihms-1019979-f0001.jpg

相似文献

1
Robust Harmonic Features for Classification-Based Pitch Estimation.用于基于分类的基音估计的稳健谐波特征
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):952-964. doi: 10.1109/TASLP.2017.2667879. Epub 2017 Feb 13.

引用本文的文献

1
Open set classification of sound event.
Sci Rep. 2024 Jan 13;14(1):1282. doi: 10.1038/s41598-023-50639-7.
2
Speech enhancement for cochlear implant recipients.人工耳蜗植入者的语音增强。
J Acoust Soc Am. 2018 Apr;143(4):2244. doi: 10.1121/1.5031112.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验