Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:3957-3960. doi: 10.1109/EMBC46164.2021.9630098.
Assessing the upper airway (UA) of obstructive sleep apnea patients using drug-induced sleep endoscopy (DISE) before potential surgery is standard practice in clinics to determine the location of UA collapse. According to the VOTE classification system, UA collapse can occur at the velum (V), oropharynx (O), tongue (T), and/or epiglottis (E). Analyzing DISE videos is not trivial due to anatomical variation, simultaneous UA collapse in several locations, and video distortion caused by mucus or saliva. The first step towards automated analysis of DISE videos is to determine which UA region the endoscope is in at any time throughout the video: V (velum) or OTE (oropharynx, tongue, or epiglottis). An additional class denoted X is introduced for times when the video is distorted to an extent where it is impossible to determine the region. This paper is a proof of concept for classifying UA regions using 24 annotated DISE videos. We propose a convolutional recurrent neural network using a ResNet18 architecture combined with a two-layer bidirectional long short-term memory network. The classifications were performed on a sequence of 5 seconds of video at a time. The network achieved an overall accuracy of 82% and F1-score of 79% for the three-class problem, showing potential for recognition of regions across patients despite anatomical variation. Results indicate that large-scale training on videos can be used to further predict the location(s), type(s), and degree(s) of UA collapse, showing potential for derivation of automatic diagnoses from DISE videos eventually.
在进行潜在手术之前,使用药物诱导睡眠内窥镜检查(DISE)评估阻塞性睡眠呼吸暂停患者的上呼吸道(UA)是临床中的标准做法,以确定 UA 塌陷的位置。根据 VOTE 分类系统,UA 塌陷可能发生在软腭(V)、口咽(O)、舌(T)和/或会厌(E)。由于解剖结构的变化、几个部位同时发生 UA 塌陷以及粘液或唾液引起的视频失真,分析 DISE 视频并非易事。对 DISE 视频进行自动分析的第一步是确定内窥镜在视频中的任何时刻所处的 UA 区域:V(软腭)或 OTE(口咽、舌或会厌)。引入一个额外的类别 X,表示视频失真到无法确定区域的程度。本文是使用 24 个标注的 DISE 视频对 UA 区域进行分类的概念验证。我们提出了一种使用 ResNet18 架构结合两层双向长短期记忆网络的卷积递归神经网络。该网络对 5 秒的视频序列进行分类。该网络在三分类问题上的总体准确率为 82%,F1 得分为 79%,尽管存在解剖结构变化,但在识别不同患者的区域方面显示出了潜力。结果表明,对视频进行大规模训练可以进一步预测 UA 塌陷的位置、类型和程度,最终有望从 DISE 视频中得出自动诊断。