Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Germany.
Artif Intell Med. 2010 May;49(1):51-9. doi: 10.1016/j.artmed.2010.01.001.
This work presents a computer-aided method for automatically and objectively classifying individuals with healthy and dysfunctional vocal fold vibration patterns as depicted in clinical high-speed (HS) videos of the larynx.
By employing a specialized image segmentation and vocal fold movement visualization technique - namely phonovibrography - a novel set of numerical features is derived from laryngeal HS videos capturing the dynamic behavior and the symmetry of oscillating vocal folds. In order to assess the discriminatory power of the features, a support vector machine is applied to the preprocessed data with regard to clinically relevant diagnostic tasks. Finally, the classification performance of the learned nonlinear models is evaluated to allow for conclusions to be drawn about suitability of features and data resulting from different examination paradigms. As a reference, a second feature set is determined which corresponds to more traditional voice analysis approaches.
For the first time an automatic classification of healthy and pathological voices could be obtained by analyzing the vibratory patterns of vocal folds using phonovibrograms (PVGs). An average classification accuracy of approximately 81% was achieved for 2-class discrimination with PVG features. This exceeds the results obtained through traditional voice analysis features. Furthermore, a relevant influence of phonation frequency on classification accuracy was substantiated by the clinical HS data.
The PVG feature extraction and classification approach can be assessed as being promising with regard to the diagnosis of functional voice disorders. The obtained results indicate that an objective analysis of dysfunctional vocal fold vibration can be achieved with considerably high accuracy. Moreover, the PVG classification method holds a lot of potential when it comes to the clinical assessment of voice pathologies in general, as the diagnostic support can be provided to the voice clinician in a timely and reliable manner. Due to the observed interdependency between phonation frequency and classification accuracy, in future comparative studies of HS recordings of oscillating vocal folds homogeneous frequencies should be taken into account during examination.
本研究提出了一种计算机辅助方法,用于自动且客观地对临床高速(HS)喉视频中显示的健康和功能障碍声带振动模式的个体进行分类。
通过使用专门的图像分割和声带运动可视化技术 - 即声振图 - 从捕获动态行为和振荡声带对称性的 HS 视频中得出一组新的数值特征。为了评估特征的判别能力,使用支持向量机对预处理数据进行分析,以实现与临床相关的诊断任务。最后,评估所学习的非线性模型的分类性能,以得出关于不同检查范式的特征和数据的适用性的结论。作为参考,确定了与更传统的语音分析方法相对应的第二特征集。
首次通过使用声振图分析声带的振动模式,可以实现对健康和病理嗓音的自动分类。使用 PVG 特征进行 2 类判别时,平均分类准确率约为 81%。这超过了通过传统语音分析特征获得的结果。此外,通过临床 HS 数据证实了声频对分类准确性的相关影响。
就功能性嗓音障碍的诊断而言,PVG 特征提取和分类方法具有很大的潜力。所获得的结果表明,可以非常高的准确度实现对功能障碍声带振动的客观分析。此外,当涉及到一般的嗓音病理学的临床评估时,PVG 分类方法具有很大的潜力,因为可以及时可靠地为嗓音临床医生提供诊断支持。由于观察到声频与分类准确性之间的相互依存关系,在未来对振荡声带的 HS 记录进行比较研究时,应在检查期间考虑到同质频率。