Twinanda Andru Putra, Marescaux Jacques, de Mathelin Michel, Padoy Nicolas
ICube Laboratory, University of Strasbourg, CNRS, IHU, Strasbourg, France,
Int J Comput Assist Radiol Surg. 2015 Sep;10(9):1449-60. doi: 10.1007/s11548-015-1183-4. Epub 2015 Apr 7.
One of the advantages of minimally invasive surgery (MIS) is that the underlying digitization provides invaluable information regarding the execution of procedures in various patient-specific conditions. However, such information can only be obtained conveniently if the laparoscopic video database comes with semantic annotations, which are typically provided manually by experts. Considering the growing popularity of MIS, manual annotation becomes a laborious and costly task. In this paper, we tackle the problem of laparoscopic video classification, which consists of automatically identifying the type of abdominal surgery performed in a video. In addition to performing classifications on the full recordings of the procedures, we also carry out sub-video and video clip classifications. These classifications are carried out to investigate how many frames from a video are needed to get a good classification performance and which parts of the procedures contain more discriminative features.
Our classification pipeline is as follows. First, we reject the irrelevant frames from the videos using the color properties of the video frames. Second, we extract visual features from the relevant frames. Third, we quantize the features using several feature encoding methods, i.e., vector quantization, sparse coding (SC), and Fisher encoding. Fourth, we carry out the classification using support vector machines. While the sub-video classification is carried out by uniformly downsampling the video frames, the video clip classification is carried out by taking three parts of the videos (i.e., beginning, middle, and end) and running the classification pipeline separately for every video part. Ultimately, we build our final classification model by combining the features using a multiple kernel learning (MKL) approach.
To carry out the experiments, we use a dataset containing 208 videos of eight different surgeries performed by 10 different surgeons. The results show that SC with K-singular value decomposition (K-SVD) yields the best classification accuracy. The results also demonstrate that the classification accuracy only decreases by 3 % when solely 60 % of the video frames are utilized. Furthermore, it is also shown that the end part of the procedures is the most discriminative part of the surgery. Specifically, by using only the last 20 % of the video frames, a classification accuracy greater than 70 % can be achieved. Finally, the combination of all features yields the best performance of 90.38 % accuracy.
The SC with K-SVD provides the best representation of our videos, yielding the best accuracies for all features. In terms of information, the end part of the laparoscopic videos is the most discriminative compared to the other parts of the videos. In addition to their good performance individually, the features yield even better classification results when all of them are combined using the MKL approach.
微创手术(MIS)的优势之一在于其潜在的数字化特性能够提供有关在各种特定患者条件下手术执行情况的宝贵信息。然而,只有当腹腔镜视频数据库带有语义注释时,此类信息才能方便地获取,而语义注释通常由专家手动提供。鉴于MIS的日益普及,手动注释成为一项艰巨且成本高昂的任务。在本文中,我们着手解决腹腔镜视频分类问题,即自动识别视频中所进行的腹部手术类型。除了对手术的完整记录进行分类之外,我们还进行子视频和视频片段分类。进行这些分类是为了研究视频需要多少帧才能获得良好的分类性能,以及手术的哪些部分包含更多的判别特征。
我们的分类流程如下。首先,利用视频帧的颜色属性去除视频中的无关帧。其次,从相关帧中提取视觉特征。第三,使用多种特征编码方法对特征进行量化,即矢量量化、稀疏编码(SC)和Fisher编码。第四,使用支持向量机进行分类。子视频分类通过对视频帧进行均匀下采样来实现,而视频片段分类则通过选取视频的三个部分(即开头、中间和结尾)并分别对每个视频部分运行分类流程来进行。最终,我们使用多核学习(MKL)方法组合特征来构建最终的分类模型。
为了进行实验,我们使用了一个包含由10位不同外科医生进行的8种不同手术的208个视频的数据集。结果表明,采用K奇异值分解(K-SVD)的SC产生了最佳的分类准确率。结果还表明,当仅使用60%的视频帧时,分类准确率仅下降3%。此外,还表明手术的结尾部分是手术中最具判别力的部分。具体而言,仅使用视频帧的最后20%,就可以实现大于70%的分类准确率。最后,所有特征的组合产生了90.