Suppr超能文献

使用深度卷积长短期记忆网络对喉内窥镜高速视频中的声门和声带进行全自动分割。

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.

机构信息

Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany.

Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, München, Germany.

出版信息

PLoS One. 2020 Feb 10;15(2):e0227791. doi: 10.1371/journal.pone.0227791. eCollection 2020.

Abstract

The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.

摘要

对声带振动的动态特性进行客观研究需要记录和进一步对喉部高速视频(HSV)进行定量分析。量化声带振动模式首先需要在每一帧视频中分割声门区域,通常可以从声门区域中得出声带的振动边缘。因此,任何进一步的振动分析的结果都取决于此初始分割过程的质量。在这项工作中,我们首次提出了一种使用深度卷积神经网络(CNN)方法从喉部高速视频(HSV)中自动分割不仅时变声门区域,而且直接分割声带组织的程序。总共对来自 56 名健康人和 74 名病理患者的 13000 个高速视频(HSV)帧训练和评估了 18 种不同的卷积神经网络(CNN)网络配置。在包括 100 个连续图像的 15 个测试视频序列上,对性能最佳的卷积神经网络(CNN)模型(该模型使用长短时记忆(LSTM)单元来考虑时间上下文)的分割质量进行了深入研究。作为性能度量,使用了 Dice 系数(DC)和四个解剖学地标位置的精度。在所有测试数据中,声门的平均 Dice 系数(DC)为 0.85,右声带和左声带的平均 Dice 系数(DC)分别为 0.91 和 0.90。所识别地标点的平均精度为 2.2 个像素,与可作为金标准的可比手动专家分割相同。这里提出的方法不需要用户交互,克服了当前半自动或计算昂贵方法的局限性。因此,它还允许分析长的高速视频(HSV)序列,并有望在临床常规中促进声带振动的客观分析。这里使用的包括地面实况的数据集将免费提供给所有科学团体,以便在未来对分割方法进行定量基准测试。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/117e/7010264/968cc9eae9ea/pone.0227791.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验