Wang Yang Yang, Hamad Ali S, Lever Teresa E, Bunyak Filiz
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:2167-2172. doi: 10.1109/EMBC44109.2020.9176149.
Vocal folds (VFs) play a critical role in breathing, swallowing, and speech production. VF dysfunctions caused by various medical conditions can significantly reduce patients' quality of life and lead to life-threatening conditions such as aspiration pneumonia, caused by food and/or liquid "invasion" into the windpipe. Laryngeal endoscopy is routinely used in clinical practice to inspect the larynx and to assess the VF function. Unfortunately, the resulting videos are only visually inspected, leading to loss of valuable information that can be used for early diagnosis and disease or treatment monitoring. In this paper, we propose a deep learning-based image analysis solution for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos. Laryngeal endoscopy image analysis is a challenging task because of anatomical variations and various imaging problems. Analysis of LAR events is further challenging because of data imbalance since these are rare events. In order to tackle this problem, we propose a deep learning system that consists of a two-stream network with a novel orthogonal region selection subnetwork. To our best knowledge, this is the first deep learning network that learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region, which drastically reduces labor-intensive manual annotation needed for mask or track generation. The proposed two-stream network and the orthogonal region selection subnetwork allow integration of local and global information for improved performance. The experimental results show promising performance for the automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos.Clinical relevance- This paper presents an objective, quantitative, and automatic deep learning based system for detection of laryngeal adductor reflex (LAR) events in laryngoscopy videos.
声带在呼吸、吞咽和言语产生过程中起着关键作用。由各种医疗状况引起的声带功能障碍会显著降低患者的生活质量,并导致危及生命的状况,如因食物和/或液体“侵入”气管而引发的吸入性肺炎。喉镜检查在临床实践中常用于检查喉部并评估声带功能。不幸的是,所得到的视频仅通过肉眼检查,导致可用于早期诊断以及疾病或治疗监测的宝贵信息丢失。在本文中,我们提出了一种基于深度学习的图像分析解决方案,用于自动检测喉镜视频中的喉内收肌反射(LAR)事件。喉镜图像分析是一项具有挑战性的任务,原因在于解剖结构的差异和各种成像问题。由于数据不均衡,LAR事件的分析更具挑战性,因为这些是罕见事件。为了解决这个问题,我们提出了一个深度学习系统,它由一个双流网络和一个新颖的正交区域选择子网组成。据我们所知,这是第一个无需先分割或跟踪声带区域就能直接将其输入映射到声带打开/关闭状态的深度学习网络,这极大地减少了生成掩码或轨迹所需的劳动密集型手动标注。所提出的双流网络和正交区域选择子网允许整合局部和全局信息以提高性能。实验结果表明,对于喉镜视频中LAR事件的自动、客观和定量分析具有良好的性能。临床相关性——本文提出了一个基于深度学习的客观、定量和自动系统,用于检测喉镜视频中的喉内收肌反射(LAR)事件。