Ibarra Emiro J, Parra Jesús A, Alzamendi Gabriel A, Cortés Juan P, Espinoza Víctor M, Mehta Daryush D, Hillman Robert E, Zañartu Matías
Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile.
School of Electrical Engineering, University of the Andes, Mérida, Venezuela.
Front Physiol. 2021 Sep 1;12:732244. doi: 10.3389/fphys.2021.732244. eCollection 2021.
The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm HO) and 243 Pa (2.48 cm HO), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.
通过获取基于生理的特征来描述嗓音障碍个体潜在的病理生理机制,可显著增强嗓音功能的动态评估。这种增强可以改善基于行为的嗓音障碍的预防、诊断和治疗方法。不幸的是,在实验室和动态环境中,直接测量重要的嗓音特征,如声门下压力、声带碰撞压力和喉肌激活是不切实际的。在本研究中,我们介绍了一种方法,通过一个整合了与生理相关的嗓音产生模型和机器学习工具的框架,从颈部表面振动信号中估计发声过程中的这些特征。首先使用基于声门下阻抗的逆滤波对来自颈部表面加速度计的信号进行处理,以产生不稳定声门气流的估计值。从颈部表面加速度计和一个可选的麦克风信号中提取七个空气动力学和声学特征。选择一种神经网络架构,以提供七个输入特征与声门下压力、声带碰撞压力以及环甲肌和甲杓肌激活之间的映射。这种非线性映射仅通过对一个利用声带对称三角体覆盖模型的嗓音产生模型进行13000次蒙特卡罗模拟来训练。将该方法的性能与79名嗓音健康的女性参与者在舒适、大声和轻声水平下连续发出/pæ/音节串时,同步记录的口腔气流、口腔内压力、麦克风和颈部表面振动的实验室数据进行了比较。估计平均声门下压力的平均绝对误差和均方根误差分别为191 Pa(1.95 cm HO)和243 Pa(2.48 cm HO),这与先前的研究相当,但关键优势在于不需要针对个体的训练,并且能产生更多的输出测量值。以合成值作为参考对声带碰撞压力和喉肌激活进行了验证。这些初步结果为进一步完善声带模型提供了有价值的见解,并构成了一个概念验证,即所提出的机器学习方法是为嗓音功能的实验室和动态评估提供与生理相关测量值的可行选择。