Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada.
Department of Physics, Faculty of Science, University of Alberta, Edmonton, AB, Canada.
Am J Otolaryngol. 2022 Mar-Apr;43(2):103327. doi: 10.1016/j.amjoto.2021.103327. Epub 2021 Dec 15.
Early recognition and referral are crucial for voice disorder management. Limited availability of subspecialists, poor primary care awareness, and the need for specialized equipment impede effective care. Thus, there is a need for a tool to improve voice pathology screening. Machine learning algorithms (MLAs) have shown promise in analyzing acoustic characteristics of phonation. However, few studies report clinical applications of MLAs for voice pathology detection. The objective of this study was to design and validate a MLA for detecting pathological voices.
A MLA was developed for voice analysis. Audio samples converted into spectrograms were inputted into a pre-existing VGG19 convolutional neural network (CNN) and image-classifier. The resulting feature map was classified as either pathological or healthy using a Support Vector Machine (SVM) binary linear classifier. This combined MLA was "trained" with 950 sustained "/i/" vowel audio samples from the Saarbrucken Voice Database (SVD), which contains subjects with and without voice disorders. The trained MLA was "tested" with 406 SVD samples to determine sensitivity, specificity, and overall accuracy. External validation of the MLA was performed using clinical voice samples collected from patients attending a subspecialty voice clinic.
The MLA detected pathologies in SVD samples with 98.5% sensitivity, 97.1% specificity and 97.8% overall accuracy. In 30 samples obtained prospectively from voice clinic patients, the MLA detected pathologies with 100% sensitivity, 96.3% specificity and 96.7% overall accuracy.
This study demonstrates that a MLA using a simple audio input can detect diverse vocal pathologies with high sensitivity and specificity. Thus, this algorithm shows promise as a potential screening tool.
早期识别和转介对于嗓音障碍管理至关重要。由于专家资源有限、初级保健意识较差以及对专业设备的需求,有效治疗受到阻碍。因此,需要有一种工具来改善嗓音病理学筛查。机器学习算法(MLA)已显示出在分析发声的声学特征方面的潜力。然而,很少有研究报告 MLA 用于嗓音病理学检测的临床应用。本研究旨在设计和验证一种用于检测病理性嗓音的 MLA。
我们开发了一种用于语音分析的 MLA。转换为声谱图的音频样本被输入到现有的 VGG19 卷积神经网络(CNN)和图像分类器中。使用支持向量机(SVM)二进制线性分类器对生成的特征图进行分类,分为病理性或健康性。这种组合的 MLA 使用来自 Saarbrucken 嗓音数据库(SVD)的 950 个持续的/i/元音音频样本进行“训练”,该数据库包含有声带疾病和无嗓音疾病的受试者。使用 406 个 SVD 样本对经过训练的 MLA 进行“测试”,以确定灵敏度、特异性和总体准确性。使用从参加嗓音专科诊所的患者中收集的临床嗓音样本对 MLA 进行外部验证。
该 MLA 检测 SVD 样本中的病变,灵敏度为 98.5%,特异性为 97.1%,总体准确性为 97.8%。在从嗓音诊所患者前瞻性获得的 30 个样本中,MLA 检测到的病变具有 100%的灵敏度、96.3%的特异性和 96.7%的总体准确性。
本研究表明,使用简单音频输入的 MLA 可以以高灵敏度和特异性检测多种嗓音病变。因此,该算法作为一种潜在的筛查工具具有很大的应用前景。