Donhauser Jonas, Tur Bogac, Döllinger Michael
Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany.
Front Physiol. 2024 Feb 21;15:1282574. doi: 10.3389/fphys.2024.1282574. eCollection 2024.
Vocal fold (VF) vibrations are the primary source of human phonation. High-speed video (HSV) endoscopy enables the computation of descriptive VF parameters for assessment of physiological properties of laryngeal dynamics, i.e., the vibration of the VFs. However, underlying biomechanical factors responsible for physiological and disordered VF vibrations cannot be accessed. In contrast, physically based numerical VF models reveal insights into the organ's oscillations, which remain inaccessible through endoscopy. To estimate biomechanical properties, previous research has fitted subglottal pressure-driven mass-spring-damper systems, as inverse problem to the HSV-recorded VF trajectories, by global optimization of the numerical model. A neural network trained on the numerical model may be used as a substitute for computationally expensive optimization, yielding a fast evaluating surrogate of the biomechanical inverse problem. This paper proposes a convolutional recurrent neural network (CRNN)-based architecture trained on regression of a physiological-based biomechanical six-mass model (6 MM). To compare with previous research, the underlying biomechanical factor "subglottal pressure" prediction was tested against 288 HSV porcine recordings. The contributions of this work are two-fold: first, the presented CRNN with the 6 MM handles multiple trajectories along the VFs, which allows for investigations on local changes in VF characteristics. Second, the network was trained to reproduce further important biomechanical model parameters like VF mass and stiffness on synthetic data. Unlike in a previous work, the network in this study is therefore an entire surrogate of the inverse problem, which allowed for explicit computation of the fitted model using our approach. The presented approach achieves a best-case mean absolute error (MAE) of 133 Pa (13.9%) in subglottal pressure prediction with 76.6% correlation on experimental data and a re-estimated fundamental frequency MAE of 15.9 Hz (9.9%). In-detail training analysis revealed subglottal pressure as the most learnable parameter. With the physiological-based model design and advances in fast parameter prediction, this work is a next step in biomechanical VF model fitting and the estimation of laryngeal kinematics.
声带(VF)振动是人类发声的主要来源。高速视频(HSV)内窥镜检查能够计算描述性的VF参数,以评估喉动力学的生理特性,即VF的振动。然而,导致生理和紊乱的VF振动的潜在生物力学因素无法通过这种方式得知。相比之下,基于物理的数值VF模型揭示了该器官振动的相关见解,而这些通过内窥镜检查是无法获得的。为了估计生物力学特性,先前的研究通过对数值模型进行全局优化,将声门下压力驱动的质量 - 弹簧 - 阻尼系统作为HSV记录的VF轨迹的逆问题进行拟合。在数值模型上训练的神经网络可以替代计算成本高昂的优化过程,产生生物力学逆问题的快速评估替代模型。本文提出了一种基于卷积循环神经网络(CRNN)的架构,该架构在基于生理的生物力学六质量模型(6MM)的回归上进行训练。为了与先前的研究进行比较,针对288个HSV猪记录测试了潜在生物力学因素“声门下压力”的预测。这项工作的贡献有两个方面:第一,所提出的带有6MM的CRNN能够处理沿VF的多个轨迹,这使得对VF特性的局部变化进行研究成为可能。第二,该网络经过训练以在合成数据上再现其他重要的生物力学模型参数,如VF质量和刚度。与先前的工作不同,本研究中的网络因此是逆问题的完整替代模型,这使得使用我们的方法能够明确计算拟合模型。所提出的方法在声门下压力预测中实现了最佳情况下的平均绝对误差(MAE)为133Pa(13.9%),与实验数据的相关性为76.6%,重新估计的基频MAE为15.9Hz(9.9%)。详细的训练分析表明声门下压力是最容易学习的参数。基于生理的模型设计以及快速参数预测方面的进展,这项工作是生物力学VF模型拟合和喉运动学估计的下一步。