Chang Qi, Ahmad Danish, Toth Jennifer, Bascom Rebecca, Higgins William E
School of Electrical Engineering and Computer Science, Penn State University, University Park, PA 16802, USA.
Penn State Milton S. Hershey Medical Center, Hershey, PA 17033, USA.
J Imaging. 2024 Aug 7;10(8):191. doi: 10.3390/jimaging10080191.
For patients at risk of developing either lung cancer or colorectal cancer, the identification of suspect lesions in endoscopic video is an important procedure. The physician performs an endoscopic exam by navigating an endoscope through the organ of interest, be it the lungs or intestinal tract, and performs a visual inspection of the endoscopic video stream to identify lesions. Unfortunately, this entails a tedious, error-prone search over a lengthy video sequence. We propose a deep learning architecture that enables the real-time detection and segmentation of lesion regions from endoscopic video, with our experiments focused on autofluorescence bronchoscopy (AFB) for the lungs and colonoscopy for the intestinal tract. Our architecture, dubbed ESFPNet, draws on a pretrained Mix Transformer (MiT) encoder and a decoder structure that incorporates a new Efficient Stage-Wise Feature Pyramid (ESFP) to promote accurate lesion segmentation. In comparison to existing deep learning models, the ESFPNet model gave superior lesion segmentation performance for an AFB dataset. It also produced superior segmentation results for three widely used public colonoscopy databases and nearly the best results for two other public colonoscopy databases. In addition, the lightweight ESFPNet architecture requires fewer model parameters and less computation than other competing models, enabling the real-time analysis of input video frames. Overall, these studies point to the combined superior analysis performance and architectural efficiency of the ESFPNet for endoscopic video analysis. Lastly, additional experiments with the public colonoscopy databases demonstrate the learning ability and generalizability of ESFPNet, implying that the model could be effective for region segmentation in other domains.
对于有患肺癌或结直肠癌风险的患者,在内窥镜视频中识别可疑病变是一项重要的操作。医生通过将内窥镜插入感兴趣的器官(无论是肺部还是肠道)来进行内窥镜检查,并对内窥镜视频流进行目视检查以识别病变。不幸的是,这需要在冗长的视频序列上进行繁琐且容易出错的搜索。我们提出了一种深度学习架构,能够从内窥镜视频中实时检测和分割病变区域,我们的实验重点是针对肺部的自发荧光支气管镜检查(AFB)和针对肠道的结肠镜检查。我们的架构称为ESFPNet,它借鉴了预训练的混合Transformer(MiT)编码器和一个解码器结构,该结构包含一个新的高效逐阶段特征金字塔(ESFP),以促进准确的病变分割。与现有的深度学习模型相比,ESFPNet模型在AFB数据集上给出了卓越的病变分割性能。它在三个广泛使用的公共结肠镜检查数据库中也产生了卓越的分割结果,在另外两个公共结肠镜检查数据库中产生了几乎最佳的结果。此外,轻量级的ESFPNet架构比其他竞争模型需要更少的模型参数和更少的计算量,能够对输入视频帧进行实时分析。总体而言,这些研究表明ESFPNet在内窥镜视频分析方面具有卓越的综合分析性能和架构效率。最后,对公共结肠镜检查数据库进行的额外实验证明了ESFPNet的学习能力和通用性,这意味着该模型在其他领域的区域分割中可能是有效的。