Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, 65211, Missouri, USA.
Department of Otolaryngology - Head and Neck Surgery, University of Missouri, Columbia, 65211, Missouri, USA.
Comput Biol Med. 2022 May;144:105339. doi: 10.1016/j.compbiomed.2022.105339. Epub 2022 Feb 28.
The vocal folds (VFs) are a pair of muscles in the larynx that play a critical role in breathing, swallowing, and speaking. VF function can be adversely affected by various medical conditions including head or neck injuries, stroke, tumor, and neurological disorders. In this paper, we propose a deep learning system for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos to enable objective, quantitative analysis of VF function. The proposed deep learning system incorporates our novel orthogonal region selection network and temporal context. This network learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region. This one-step approach drastically reduces manual annotation needs from labor-intensive segmentation masks or VF motion tracks to frame-level class labels. The proposed spatio-temporal network with an orthogonal region selection subnetwork allows integration of local image features, global image features, and VF state information in time for robust LAR event detection. The proposed network is evaluated against several network variations that incorporate temporal context and is shown to lead to better performance. The experimental results show promising performance for automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos with over 90% and 99% F1 scores for LAR and non-LAR frames respectively.
声带是喉部的一对肌肉,在呼吸、吞咽和说话中起着至关重要的作用。声带功能可能会受到各种医疗状况的影响,包括头部或颈部受伤、中风、肿瘤和神经紊乱。在本文中,我们提出了一种深度学习系统,用于自动检测喉内收反射(LAR)事件的喉内窥镜视频,以实现对声带功能的客观、定量分析。所提出的深度学习系统结合了我们新颖的正交区域选择网络和时间上下文。该网络学会直接将其输入映射到声带打开/关闭状态,而无需首先对声带区域进行分割或跟踪。这种一步到位的方法大大减少了手动注释的需求,从劳动密集型分割掩模或声带运动轨迹到帧级别的类别标签。具有正交区域选择子网络的提出的时空网络允许在时间上集成局部图像特征、全局图像特征和声带状态信息,以实现稳健的 LAR 事件检测。所提出的网络针对几种结合时间上下文的网络变体进行了评估,并显示出更好的性能。实验结果表明,该网络在自动、客观和定量分析喉内窥镜视频中的 LAR 事件方面具有有前景的性能,对于 LAR 和非-LAR 帧,其 F1 得分分别超过 90%和 99%。