Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China.
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China.
Sensors (Basel). 2022 Aug 12;22(16):6026. doi: 10.3390/s22166026.
Pitch estimation is widely used in speech and audio signal processing. However, the current methods of modeling harmonic structure used for pitch estimation cannot always match the harmonic distribution of actual signals. Due to the structure of vocal tract, the acoustic nature of musical equipment, and the spectrum leakage issue, speech and audio signals' harmonic frequencies often slightly deviate from the integer multiple of the pitch. This paper starts with the summation of residual harmonics (SRH) method and makes two main modifications. First, the spectral peak position constraint of strict integer multiple is modified to allow slight deviation, which benefits capturing harmonics. Second, a main pitch segment extension scheme with low computational cost feature is proposed to utilize the smooth prior of pitch more efficiently. Besides, the pitch segment extension scheme is also integrated into the SRH method's voiced/unvoiced decision to reduce short-term errors. Accuracy comparison experiments with ten pitch estimation methods show that the proposed method has better overall accuracy and robustness. Time cost experiments show that the time cost of the proposed method reduces to around 1/8 of the state-of-the-art fast NLS method on the experimental computer.
基音估计在语音和音频信号处理中得到了广泛应用。然而,目前用于基音估计的谐波结构建模方法并不总能匹配实际信号的谐波分布。由于声道结构、乐器的声学性质和频谱泄漏问题,语音和音频信号的谐波频率往往略微偏离基音的整数倍。本文从和音和残差的和(SRH)方法出发,做了两个主要的修改。首先,允许有轻微的偏差,从而有利于捕捉谐波,修改了严格整数倍的谱峰位置约束。其次,提出了一种具有低计算成本特征的主要基音段扩展方案,以更有效地利用基音的平滑先验。此外,基音段扩展方案还被集成到 SRH 方法的有声/无声决策中,以减少短期误差。与十种基音估计方法的准确性比较实验表明,所提出的方法具有更好的整体准确性和鲁棒性。时间成本实验表明,在实验计算机上,所提出的方法的时间成本降低到最先进的快速 NLS 方法的约 1/8。