GlottisNetV2：基于深度卷积神经网络的时频声带中线检测

GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks.

机构信息

Department Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen-Nürnberg (FAU) 91052 Erlangen Germany.

Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg (FAU) 91054 Erlangen Germany.

出版信息

IEEE J Transl Eng Health Med. 2023 Jan 19;11:137-144. doi: 10.1109/JTEHM.2023.3237859. eCollection 2023.

DOI:10.1109/JTEHM.2023.3237859

PMID:36816097

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9933989/

Abstract

High-speed videoendoscopy is a major tool for quantitative laryngology. Glottis segmentation and glottal midline detection are crucial for computing vocal fold-specific, quantitative parameters. However, fully automated solutions show limited clinical applicability. Especially unbiased glottal midline detection remains a challenging problem. We developed a multitask deep neural network for glottis segmentation and glottal midline detection. We used techniques from pose estimation to estimate the anterior and posterior points in endoscopy images. Neural networks were set up in TensorFlow/Keras and trained and evaluated with the BAGLS dataset. We found that a dual decoder deep neural network termed GlottisNetV2 outperforms the previously proposed GlottisNet in terms of MAPE on the test dataset (1.85% to 6.3%) while converging faster. Using various hyperparameter tunings, we allow fast and directed training. Using temporal variant data on an additional data set designed for this task, we can improve the median prediction accuracy from 2.1% to 1.76% when using 12 consecutive frames and additional temporal filtering. We found that temporal glottal midline detection using a dual decoder architecture together with keypoint estimation allows accurate midline prediction. We show that our proposed architecture allows stable and reliable glottal midline predictions ready for clinical use and analysis of symmetry measures.

摘要

高速频闪喉镜是定量喉科学的主要工具。声门分割和声带中线检测对于计算声带特定的定量参数至关重要。然而，完全自动化的解决方案显示出有限的临床适用性。特别是无偏的声带中线检测仍然是一个具有挑战性的问题。我们开发了一种用于声门分割和声带中线检测的多任务深度神经网络。我们使用来自姿态估计的技术来估计内窥镜图像中的前点和后点。神经网络在 TensorFlow/Keras 中建立，并使用 BAGLS 数据集进行训练和评估。我们发现，一种称为 GlottisNetV2 的双解码器深度神经网络在测试数据集上的 MAPE 方面优于先前提出的 GlottisNet（从 1.85%提高到 6.3%），同时收敛速度更快。通过使用各种超参数调整，我们可以实现快速和有针对性的训练。在为该任务设计的附加数据集上使用时变数据，并使用 12 个连续帧和附加的时间滤波，可以将中位数预测精度从 2.1%提高到 1.76%。我们发现，使用双解码器架构和关键点估计进行时变声带中线检测可以实现准确的中线预测。我们表明，我们提出的架构允许稳定可靠的声带中线预测，可用于临床使用和分析对称度措施。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bcc/9933989/19f5dd4f8e82/kist1-3237859.jpg

相似文献

GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks.GlottisNetV2：基于深度卷积神经网络的时频声带中线检测

IEEE J Transl Eng Health Med. 2023 Jan 19;11:137-144. doi: 10.1109/JTEHM.2023.3237859. eCollection 2023.

Rethinking glottal midline detection.重新思考声门中线检测。

Sci Rep. 2020 Nov 26;10(1):20723. doi: 10.1038/s41598-020-77216-6.

Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network.使用深度卷积长短期记忆网络对喉内窥镜高速视频中的声门和声带进行全自动分割。

PLoS One. 2020 Feb 10;15(2):e0227791. doi: 10.1371/journal.pone.0227791. eCollection 2020.

BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation.BAGLS，一个用于自动声门分割的多医院基准测试。

Sci Data. 2020 Jun 19;7(1):186. doi: 10.1038/s41597-020-0526-3.

A single latent channel is sufficient for biomedical glottis segmentation.单个潜在通道足以进行生物医学声门分割。

Sci Rep. 2022 Aug 22;12(1):14292. doi: 10.1038/s41598-022-17764-1.

A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis.深度学习增强型新型喉动力学分析软件工具。

J Speech Lang Hear Res. 2021 Jun 4;64(6):1889-1903. doi: 10.1044/2021_JSLHR-20-00498. Epub 2021 May 17.

OpenHSV: an open platform for laryngeal high-speed videoendoscopy.OpenHSV：用于喉高速视频内窥镜检查的开放平台。

Sci Rep. 2021 Jul 2;11(1):13760. doi: 10.1038/s41598-021-93149-0.

Automatic and quantitative measurement of laryngeal video stroboscopic images.喉视频频闪图像的自动定量测量

Proc Inst Mech Eng H. 2017 Jan;231(1):48-57. doi: 10.1177/0954411916679200. Epub 2016 Dec 21.

Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy.高速视频内镜检查中内收型痉挛性发声障碍患者连贯言语时基于深度学习的声带动力学表现

J Voice. 2025 Mar;39(2):570.e1-570.e15. doi: 10.1016/j.jvoice.2022.08.022. Epub 2022 Sep 23.

Localization and quantification of glottal gaps on deep learning segmentation of vocal folds.基于深度学习的声带分割中声门裂的定位与量化。

Sci Rep. 2023 Jan 17;13(1):878. doi: 10.1038/s41598-023-27980-y.

引用本文的文献

LarynxFormer: a transformer-based framework for processing and segmenting laryngeal images.喉模型：一种基于Transformer的用于处理和分割喉部图像的框架。

Front Digit Health. 2025 Jul 11;7:1459136. doi: 10.3389/fdgth.2025.1459136. eCollection 2025.

Predicting semantic segmentation quality in laryngeal endoscopy images.预测喉镜检查图像中的语义分割质量。

PLoS One. 2025 Jul 3;20(7):e0314573. doi: 10.1371/journal.pone.0314573. eCollection 2025.

Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review.深度学习技术与耳鼻咽喉科影像学——最新进展综述

J Clin Med. 2023 Nov 8;12(22):6973. doi: 10.3390/jcm12226973.

本文引用的文献

SLEAP: A deep learning system for multi-animal pose tracking.SLEAP：一个用于多动物姿态跟踪的深度学习系统。

Nat Methods. 2022 Apr;19(4):486-495. doi: 10.1038/s41592-022-01426-1. Epub 2022 Apr 4.

3D convolutional neural networks for stalled brain capillary detection.用于停滞脑毛细血管检测的3D卷积神经网络。

Comput Biol Med. 2022 Feb;141:105089. doi: 10.1016/j.compbiomed.2021.105089. Epub 2021 Nov 30.

OpenHSV: an open platform for laryngeal high-speed videoendoscopy.OpenHSV：用于喉高速视频内窥镜检查的开放平台。

Sci Rep. 2021 Jul 2;11(1):13760. doi: 10.1038/s41598-021-93149-0.

A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis.深度学习增强型新型喉动力学分析软件工具。

J Speech Lang Hear Res. 2021 Jun 4;64(6):1889-1903. doi: 10.1044/2021_JSLHR-20-00498. Epub 2021 May 17.

Deep learning-enabled medical computer vision.基于深度学习的医学计算机视觉。

NPJ Digit Med. 2021 Jan 8;4(1):5. doi: 10.1038/s41746-020-00376-2.

Rethinking glottal midline detection.重新思考声门中线检测。

Sci Rep. 2020 Nov 26;10(1):20723. doi: 10.1038/s41598-020-77216-6.

BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation.BAGLS，一个用于自动声门分割的多医院基准测试。

Sci Data. 2020 Jun 19;7(1):186. doi: 10.1038/s41597-020-0526-3.

PLoS One. 2020 Feb 10;15(2):e0227791. doi: 10.1371/journal.pone.0227791. eCollection 2020.

A guide to deep learning in healthcare.深度学习在医疗保健中的应用指南。

Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7.

DeepLabCut: markerless pose estimation of user-defined body parts with deep learning.DeepLabCut：基于深度学习的用户自定义身体部位无标记姿态估计。

Nat Neurosci. 2018 Sep;21(9):1281-1289. doi: 10.1038/s41593-018-0209-y. Epub 2018 Aug 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GlottisNetV2：基于深度卷积神经网络的时频声带中线检测

GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献