Suppr超能文献

基于视觉显著性检测的腹腔镜视频关键帧提取。

Keyframe extraction from laparoscopic videos based on visual saliency detection.

机构信息

Laboratory of Medical Physics, Medical School, National and Kapodistrian University of Athens, Mikras Asias 75 str., Athens 11527, Greece.

School of Electrical and Computer Engineering, National and Technical University of Athens, Athens, Greece.

出版信息

Comput Methods Programs Biomed. 2018 Oct;165:13-23. doi: 10.1016/j.cmpb.2018.07.004. Epub 2018 Jul 18.

Abstract

BACKGROUND AND OBJECTIVE

Laparoscopic surgery offers the potential for video recording of the operation, which is important for technique evaluation, cognitive training, patient briefing and documentation. An effective way for video content representation is to extract a limited number of keyframes with semantic information. In this paper we present a novel method for keyframe extraction from individual shots of the operational video.

METHODS

The laparoscopic video was first segmented into video shots using an objectness model, which was trained to capture significant changes in the endoscope field of view. Each frame of a shot was then decomposed into three saliency maps in order to model the preference of human vision to regions with higher differentiation with respect to color, motion and texture. The accumulated responses from each map provided a 3D time series of saliency variation across the shot. The time series was modeled as a multivariate autoregressive process with hidden Markov states (HMMAR model). This approach allowed the temporal segmentation of the shot into a predefined number of states. A representative keyframe was extracted from each state based on the highest state-conditional probability of the corresponding saliency vector.

RESULTS

Our method was tested on 168 video shots extracted from various laparoscopic cholecystectomy operations from the publicly available Cholec80 dataset. Four state-of-the-art methodologies were used for comparison. The evaluation was based on two assessment metrics: Color Consistency Score (CCS), which measures the color distance between the ground truth (GT) and the closest keyframe, and Temporal Consistency Score (TCS), which considers the temporal proximity between GT and extracted keyframes. About 81% of the extracted keyframes matched the color content of the GT keyframes, compared to 77% yielded by the second-best method. The TCS of the proposed and the second-best method was close to 1.9 and 1.4 respectively.

CONCLUSIONS

Our results demonstrated that the proposed method yields superior performance in terms of content and temporal consistency to the ground truth. The extracted keyframes provided highly semantic information that may be used for various applications related to surgical video content representation, such as workflow analysis, video summarization and retrieval.

摘要

背景与目的

腹腔镜手术提供了操作视频录像的潜力,这对于技术评估、认知训练、患者介绍和记录非常重要。一种有效的视频内容表示方法是提取具有语义信息的有限数量的关键帧。本文提出了一种从手术视频的单个镜头中提取关键帧的新方法。

方法

首先使用目标模型将腹腔镜视频分割成视频镜头,该模型经过训练可捕获内窥镜视野中显著变化的情况。然后,将每一帧分解成三个显著度图,以便对人类视觉对颜色、运动和纹理差异较大的区域的偏好进行建模。从每个地图上的累积响应提供了整个镜头中显著度变化的 3D 时序列。该时序列被建模为具有隐藏马尔可夫状态(HMMAR 模型)的多元自回归过程。这种方法允许根据对应显著向量的最高状态条件概率将镜头的时间序列分割成预定义数量的状态。从每个状态中提取一个代表关键帧。

结果

我们的方法在从公共 Cholec80 数据集的各种腹腔镜胆囊切除术操作中提取的 168 个视频镜头上进行了测试。使用了四种最先进的方法进行比较。评估基于两个评估指标:颜色一致性得分(CCS),用于测量地面实况(GT)和最近关键帧之间的颜色距离,以及时间一致性得分(TCS),用于考虑 GT 和提取关键帧之间的时间接近度。与第二种方法相比,大约 81%的提取关键帧与 GT 关键帧的颜色内容匹配,而第二种方法的匹配率为 77%。所提出的方法和第二种方法的 TCS 分别接近 1.9 和 1.4。

结论

我们的结果表明,与地面实况相比,所提出的方法在内容和时间一致性方面表现出更好的性能。提取的关键帧提供了高度语义信息,可用于与手术视频内容表示相关的各种应用,例如工作流程分析、视频摘要和检索。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验