Aichinger P, Pernkopf F, Schoentgen J
Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.
Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 16c/EG, 8010, Graz, Austria.
Biomed Signal Process Control. 2019 Apr;50:158-167. doi: 10.1016/j.bspc.2019.01.007.
The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be obtained from high-speed videos automatically. Aim of the study is to propose and test a procedure that automatically detects extra pulses, which may occur in voiced source signals of pathological voices in addition to cyclic pulses.
Glottal area waveforms (GAW) are synthesized and used to test a detector for extra pulses. Regarding synthesis, for each GAW a cyclic pulse train is mixed with an extra pulse train, and additive noise. The cyclic pulse trains are varied across GAWs in terms of fundamental frequency, pulse shape, and modulation noise, i.e., jitter and shimmer. The extra pulse trains are varied across GAWs in terms of the height of the extra pulses, and their rates of occurrence. The energy level of the additive noise is also varied. Regarding detection, first, the fundamental frequency is estimated jointly with the cyclic pulse train waveform, second, the modulation noise is estimated, and finally the extra pulse train waveform is estimated. Two versions of the detector are compared, i.e., one that parameterizes the shapes of the cyclic pulses, and one that uses unparameterized pulse shape estimates. Two corpora are used for testing, i.e., one with 100 GAWs containing random extra pulses, and one with 25 GAWs containing extra pulses in the closed phases of each glottal phase representing subharmonic voices.
With pulse shape parameterization (PSP) a maximum mean accuracy of 88.3% is achieved when detecting random extra pulses. Without PSP, the maximum mean accuracy reduces to 82.9%. Detection performance decreases if the energy level of additive noise is higher than -25 dB with respect to the energy of the cyclic pulse train, and if the irregularity strength exceeds 0.1. For bicyclic, i.e., subharmonic voices, the approach fails without PSP, whereas with PSP, a mean sensitivity of 87.4% is achieved for subharmonic voices.
A synthesizer for GAWs containing extra pulses, and a detector for extra pulses are proposed. With PSP, favorable detector performance is observed for not too high levels of additive noise and irregularity strengths. In signals with high noise levels, the detector without PSP outperforms the other one. Detection of extra pulses fails if irregularity strength is large. For subharmonic voices PSP must be used.
嗓音障碍的发声运动学描述在嗓音疾病的临床护理中起着重要作用。然而,高速视频喉镜在临床实践中并未常规使用,部分原因是缺乏可从高速视频中自动获取的诊断标志物。本研究的目的是提出并测试一种程序,该程序能自动检测额外脉冲,这些额外脉冲可能出现在病理性嗓音的发声源信号中,除了周期性脉冲之外。
合成声门面积波形(GAW)并用于测试额外脉冲的检测器。关于合成,对于每个GAW,将一个周期性脉冲序列与一个额外脉冲序列以及加性噪声混合。周期性脉冲序列在不同的GAW之间,在基频、脉冲形状和调制噪声(即抖动和闪烁)方面有所变化。额外脉冲序列在不同的GAW之间,在额外脉冲的高度及其出现率方面有所变化。加性噪声的能量水平也有所变化。关于检测,首先,联合估计基频和周期性脉冲序列波形,其次,估计调制噪声,最后估计额外脉冲序列波形。比较了检测器的两个版本,即一个对周期性脉冲的形状进行参数化的版本,和一个使用未参数化脉冲形状估计的版本。使用两个语料库进行测试,一个包含100个带有随机额外脉冲的GAW,另一个包含25个在每个声门相位的闭合阶段带有额外脉冲的GAW,这些额外脉冲代表次谐波嗓音。
对于检测随机额外脉冲,采用脉冲形状参数化(PSP)时,最大平均准确率达到88.3%。不采用PSP时,最大平均准确率降至82.9%。如果加性噪声的能量水平相对于周期性脉冲序列的能量高于 -25 dB,并且不规则强度超过0.1,则检测性能会下降。对于双周期的,即次谐波嗓音,不采用PSP时该方法失败,而采用PSP时,对于次谐波嗓音平均灵敏度达到87.4%。
提出了一种用于包含额外脉冲的GAW的合成器和一种用于额外脉冲的检测器。采用PSP时,对于不太高的加性噪声水平和不规则强度,观察到检测器性能良好。在高噪声水平的信号中,不采用PSP的检测器优于另一个。如果不规则强度较大,则无法检测到额外脉冲。对于次谐波嗓音必须使用PSP。