Chu Xinqi, Poh Chee Khun, Li Liyuan, Chan Kap Luk, Yan Shuicheng, Shen Weijia, Htwe That Mon, Liu Jiang, Lim Joo Hwee, Ong Eng Hui, Ho Khek Yu
Institute for Infocomm Research, Singapore.
Med Image Comput Comput Assist Interv. 2010;13(Pt 2):522-9. doi: 10.1007/978-3-642-15745-5_64.
A video recording of an examination by Wireless Capsule Endoscopy (WCE) may typically contain more than 55,000 video frames, which makes the manual visual screening by an experienced gastroenterologist a highly time-consuming task. In this paper, we propose a novel method of epitomized summarization of WCE videos for efficient visualization to a gastroenterologist. For each short sequence of a WCE video, an epitomized frame is generated. New constraints are introduced into the epitome formulation to achieve the necessary visual quality for manual examination, and an EM algorithm for learning the epitome is derived. First, the local context weights are introduced to generate the epitomized frame. The epitomized frame preserves the appearance of all the input patches from the frames of the short sequence. Furthermore, by introducing spatial distributions for semantic interpretation of image patches in our epitome formulation, we show that it also provides a framework to facilitate the semantic description of visual features to generate organized visual summarization of WCE video, where the patches in different positions correspond to different semantic information. Our experiments on real WCE videos show that, using epitomized summarization, the number of frames have to be examined by the gastroenterologist can be reduced to less than one-tenth of the original frames in the video.
无线胶囊内镜检查(WCE)的视频记录通常可能包含超过55,000个视频帧,这使得经验丰富的胃肠病学家进行人工视觉筛查成为一项耗时极长的任务。在本文中,我们提出了一种用于WCE视频的摘要概括新方法,以便高效地向胃肠病学家进行可视化展示。对于WCE视频的每个短序列,都会生成一个摘要帧。在摘要公式中引入了新的约束条件,以实现人工检查所需的视觉质量,并推导了一种用于学习摘要的EM算法。首先,引入局部上下文权重来生成摘要帧。摘要帧保留了短序列帧中所有输入图像块的外观。此外,通过在我们的摘要公式中引入用于图像块语义解释的空间分布,我们表明它还提供了一个框架,便于对视觉特征进行语义描述,以生成WCE视频的有组织视觉摘要,其中不同位置的图像块对应不同的语义信息。我们在真实WCE视频上的实验表明,使用摘要概括,胃肠病学家必须检查的帧数可以减少到视频中原始帧数的十分之一以下。