Kitayama Itsuki, Hosokawa Kiyohito, Iwaki Shinobu, Yoshida Misao, Miyauchi Akira, Kishikawa Toshihiro, Tanaka Hidenori, Tsuda Takeshi, Sato Takashi, Takenaka Yukinori, Ogawa Makoto, Inohara Hidenori
Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Osaka 565-0871, Japan.
Department of Otorhinolaryngology, Osaka International Medical & Science Center, Osaka 543-0035, Japan.
J Acoust Soc Am. 2024 Dec 1;156(6):4217-4228. doi: 10.1121/10.0034624.
The fundamental frequency (fo) is pivotal for quantifying vocal-fold characteristics. However, the accuracy of fo estimation in hoarse voices is notably low, and no definitive algorithm for fo estimation has been previously established. In this study, we introduce an algorithm named, "Spectral-based fo Estimator Emphasized by Domination and Sequence (SFEEDS)," which enhances the spectrum method and conducted comparative analyses with conventional estimation methods. We analyzed 454 voice samples and used conventional methods and SFEEDS to calculate fo. The ground truth of fo was determined as the lowest frequency within the most dominant harmonic complex observed on the spectrogram. Subsequently, we assessed the concordance between each fo-estimation method and the fo ground truth. We also examined the variations in the accuracy of these methods when analyzing speech with hoarseness. Regardless of hoarseness, the fo-estimation accuracy was significantly greater by SFEEDS than by conventional methods. Moreover, whereas the conventional methods impaired fo-estimation accuracy in samples with roughness, the SFEEDS algorithm was robust and significantly reduced subharmonic errors. The SFEEDS fo-estimation algorithm accurately estimated the fo of both normal and hoarse voices.
基频(fo)对于量化声带特征至关重要。然而,嘶哑嗓音中基频估计的准确性显著较低,且此前尚未建立确定的基频估计算法。在本研究中,我们引入了一种名为“基于频谱的主导与序列强化基频估计器(SFEEDS)”的算法,该算法改进了频谱方法,并与传统估计方法进行了比较分析。我们分析了454个语音样本,并使用传统方法和SFEEDS来计算基频。基频的真实值被确定为在频谱图上观察到的最主要谐波复合体中的最低频率。随后,我们评估了每种基频估计方法与基频真实值之间的一致性。我们还研究了在分析嘶哑语音时这些方法准确性的变化。无论是否存在嘶哑,SFEEDS的基频估计准确性均显著高于传统方法。此外,虽然传统方法在粗糙度样本中损害了基频估计准确性,但SFEEDS算法具有鲁棒性,显著减少了次谐波误差。SFEEDS基频估计算法准确地估计了正常嗓音和嘶哑嗓音的基频。