Suppr超能文献

使用Transformer网络对小肠胶囊内镜进行视频分析

Video Analysis of Small Bowel Capsule Endoscopy Using a Transformer Network.

作者信息

Oh SangYup, Oh DongJun, Kim Dongmin, Song Woohyuk, Hwang Youngbae, Cho Namik, Lim Yun Jeong

机构信息

School of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, Kwanak-gu, Seoul 08826, Republic of Korea.

Department of Internal Medicine, Dongguk University Ilsan Hospital, Dongguk University College of Medicine, Goyang 10326, Republic of Korea.

出版信息

Diagnostics (Basel). 2023 Oct 5;13(19):3133. doi: 10.3390/diagnostics13193133.

Abstract

Although wireless capsule endoscopy (WCE) detects small bowel diseases effectively, it has some limitations. For example, the reading process can be time consuming due to the numerous images generated per case and the lesion detection accuracy may rely on the operators' skills and experiences. Hence, many researchers have recently developed deep-learning-based methods to address these limitations. However, they tend to select only a portion of the images from a given WCE video and analyze each image individually. In this study, we note that more information can be extracted from the unused frames and the temporal relations of sequential frames. Specifically, to increase the accuracy of lesion detection without depending on experts' frame selection skills, we suggest using whole video frames as the input to the deep learning system. Thus, we propose a new Transformer-architecture-based neural encoder that takes the entire video as the input, exploiting the power of the Transformer architecture to extract long-term global correlation within and between the input frames. Subsequently, we can capture the temporal context of the input frames and the attentional features within a frame. Tests on benchmark datasets of four WCE videos showed 95.1% sensitivity and 83.4% specificity. These results may significantly advance automated lesion detection techniques for WCE images.

摘要

尽管无线胶囊内镜(WCE)能有效检测小肠疾病,但它存在一些局限性。例如,由于每个病例生成的图像众多,阅读过程可能很耗时,而且病变检测的准确性可能依赖于操作者的技能和经验。因此,最近许多研究人员开发了基于深度学习的方法来解决这些局限性。然而,他们往往只从给定的WCE视频中选择一部分图像并单独分析每个图像。在本研究中,我们注意到可以从未使用的帧以及连续帧的时间关系中提取更多信息。具体而言,为了在不依赖专家帧选择技能的情况下提高病变检测的准确性,我们建议将整个视频帧作为深度学习系统的输入。因此,我们提出了一种基于Transformer架构的新型神经编码器,它将整个视频作为输入,利用Transformer架构的能力来提取输入帧内和帧之间的长期全局相关性。随后,我们可以捕捉输入帧的时间上下文以及帧内的注意力特征。对四个WCE视频的基准数据集进行的测试显示,灵敏度为95.1%,特异性为83.4%。这些结果可能会显著推进WCE图像的自动病变检测技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0898/10572266/97631baf63a4/diagnostics-13-03133-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验