Antonio Salieri Department of Vocal Studies and Vocal Research in Music Education, University of Music and Performing Arts Vienna, Vienna, Austria.
J Voice. 2021 May;35(3):365-375. doi: 10.1016/j.jvoice.2019.11.005. Epub 2020 Mar 9.
Subharmonics are an important class of voice signals, relevant for speech, pathological voice, singing, and animal bioacoustics. They arise from special cases of amplitude (AM) or frequency modulation (FM) of the time-domain signal. Surprisingly, to date there is only one open source subharmonics detector available to the scientific community: Sun's subharmonic-to-harmonic ratio (SHR). Here, this algorithm was subjected to a formal evaluation with two data sets of synthesized and empirical speech samples. Both data sets consisted of electroglottographic (EGG) signals, ie, a physiological correlate of vocal fold oscillation that bypasses vocal tract acoustics. Data Set I contained 2560 synthesized EGG signals with varying degrees of AM and FM, fundamental frequency (fo), periodicity, and signal-to-noise ratio (SNR). Data Set II was made up of 25 EGG samples extracted from the CMU Arctic speech data base. For a "ground truth" of subharmonicity, these samples were manually annotated by a group of five external experts. Analysis of the synthesized data suggested that the SHR metric is relatively robust as long as the subharmonic modulation extent is below 0.35 and 0.7 for the FM and AM scenarios, respectively. In the CMU Arctic speech data samples, the SHR analysis reached a maximum sensitivity of about 87% at a specificity of over 90%, but only for adaptive algorithm parameter settings. In contrast, the algorithm's default parameter settings could only successfully classify about 9% of all subharmonic instances. The SHR is a useful metric for assessing the degree of subharmonics contained in voice signals, but only at adaptive parameter settings. In particular, the frequency ceiling should be set to five times the highest fo, and the frame length to at least five times the largest fundamental period of the analyzed signal. For subharmonic classification a threshold of SHR ≥ 0.01 is recommended.
次谐波是一类重要的语音信号,与语音、病理性语音、歌唱和动物生物声学都有关。它们是时域信号的幅度调制(AM)或频率调制(FM)的特殊情况产生的。令人惊讶的是,到目前为止,科学界只有一种可用的开源次谐波检测器:Sun 的次谐波与谐波比(SHR)。在这里,该算法在两个合成和经验语音样本数据集上进行了正式评估。这两个数据集都由声门图(EGG)信号组成,即声带振动的生理相关信号,它绕过了声道声学。数据集 I 包含 2560 个具有不同程度 AM 和 FM、基频(fo)、周期性和信噪比(SNR)的合成 EGG 信号。数据集 II 由 25 个从 CMU 北极语音数据库中提取的 EGG 样本组成。为了获得次谐波的“真实情况”,这些样本由一组五名外部专家手动注释。对合成数据的分析表明,只要次谐波调制幅度分别低于 FM 和 AM 情况的 0.35 和 0.7,SHR 度量就相对稳健。在 CMU 北极语音数据样本中,SHR 分析在特异性超过 90%的情况下,达到了约 87%的最大灵敏度,但仅适用于自适应算法参数设置。相比之下,算法的默认参数设置只能成功分类所有次谐波实例的约 9%。SHR 是评估语音信号中所含次谐波程度的有用指标,但仅在自适应参数设置下有效。特别是,频率上限应设置为最高 fo 的五倍,帧长度应至少设置为分析信号最大基周期的五倍。建议将 SHR≥0.01 作为次谐波分类的阈值。