Unger Jakob, Schuster Maria, Hecker Dietmar J, Schick Bernhard, Lohscheller Jörg
Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, Marchioninistr. 13, 81366 München, Germany.
Artif Intell Med. 2016 Jan;66:15-28. doi: 10.1016/j.artmed.2015.10.002. Epub 2015 Oct 30.
This work presents a computer-based approach to analyze the two-dimensional vocal fold dynamics of endoscopic high-speed videos, and constitutes an extension and generalization of a previously proposed wavelet-based procedure. While most approaches aim for analyzing sustained phonation conditions, the proposed method allows for a clinically adequate analysis of both dynamic as well as sustained phonation paradigms.
The analysis procedure is based on a spatio-temporal visualization technique, the phonovibrogram, that facilitates the documentation of the visible laryngeal dynamics. From the phonovibrogram, a low-dimensional set of features is computed using a principle component analysis strategy that quantifies the type of vibration patterns, irregularity, lateral symmetry and synchronicity, as a function of time. Two different test bench data sets are used to validate the approach: (I) 150 healthy and pathologic subjects examined during sustained phonation. (II) 20 healthy and pathologic subjects that were examined twice: during sustained phonation and a glissando from a low to a higher fundamental frequency. In order to assess the discriminative power of the extracted features, a Support Vector Machine is trained to distinguish between physiologic and pathologic vibrations. The results for sustained phonation sequences are compared to the previous approach. Finally, the classification performance of the stationary analyzing procedure is compared to the transient analysis of the glissando maneuver.
For the first test bench the proposed procedure outperformed the previous approach (proposed feature set: accuracy: 91.3%, sensitivity: 80%, specificity: 97%, previous approach: accuracy: 89.3%, sensitivity: 76%, specificity: 96%). Comparing the classification performance of the second test bench further corroborates that analyzing transient paradigms provides clear additional diagnostic value (glissando maneuver: accuracy: 90%, sensitivity: 100%, specificity: 80%, sustained phonation: accuracy: 75%, sensitivity: 80%, specificity: 70%).
The incorporation of parameters describing the temporal evolvement of vocal fold vibration clearly improves the automatic identification of pathologic vibration patterns. Furthermore, incorporating a dynamic phonation paradigm provides additional valuable information about the underlying laryngeal dynamics that cannot be derived from sustained conditions. The proposed generalized approach provides a better overall classification performance than the previous approach, and hence constitutes a new advantageous tool for an improved clinical diagnosis of voice disorders.
本研究提出一种基于计算机的方法,用于分析内镜高速视频中的二维声带动力学,它是对先前提出的基于小波的方法的扩展和推广。虽然大多数方法旨在分析持续发声条件,但该方法能够对动态发声和持续发声范式进行临床上充分的分析。
分析程序基于一种时空可视化技术——声振图,它有助于记录可见的喉部动力学。从声振图中,使用主成分分析策略计算一组低维特征,该策略根据时间量化振动模式的类型、不规则性、横向对称性和同步性。使用两个不同的测试台数据集来验证该方法:(I)150名健康和病理受试者在持续发声期间接受检查。(II)20名健康和病理受试者接受了两次检查:一次是在持续发声期间,另一次是在从低音调到高音调的滑音过程中。为了评估提取特征的判别能力,训练了一个支持向量机来区分生理性和病理性振动。将持续发声序列的结果与先前的方法进行比较。最后,将静态分析程序的分类性能与滑音动作的瞬态分析进行比较。
对于第一个测试台,所提出的程序优于先前的方法(所提出的特征集:准确率:91.3%,灵敏度:80%,特异性:97%,先前的方法:准确率:89.3%,灵敏度:76%,特异性:96%)。比较第二个测试台的分类性能进一步证实,分析瞬态范式提供了明显的额外诊断价值(滑音动作:准确率:90%,灵敏度:100%,特异性:80%,持续发声:准确率:75%,灵敏度:80%,特异性:70%)。
纳入描述声带振动时间演变的参数明显改善了病理性振动模式的自动识别。此外,纳入动态发声范式提供了关于潜在喉部动力学的额外有价值信息,这些信息无法从持续发声条件中获得。所提出的广义方法比先前的方法具有更好的整体分类性能,因此构成了一种用于改进嗓音障碍临床诊断的新的有利工具。