Department of Convergence Medicine, Biomedical Engineering Research Center, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States.
Department of Convergence Medicine, Biomedical Engineering Research Center, University of Ulsan College of Medicine, Asan Medical Center, Seoul, South Korea.
Comput Biol Med. 2021 Jun;133:104384. doi: 10.1016/j.compbiomed.2021.104384. Epub 2021 Apr 14.
Recent advances in robotics and deep learning can be used in endoscopic surgeries and can provide numerous advantages by freeing one of the surgeon's hands. This study aims to automatically detect the tip of the instrument, localize a point, and evaluate the detection accuracy in biportal endoscopic spine surgery (BESS). The tip detection could serve as a preliminary study for the development of vision intelligence in robotic endoscopy.
The dataset contains 2310 frames from 9 BESS videos with x and y coordinates of the tip annotated by an expert. We trained two state-of-the-art detectors, RetinaNet and YOLOv2, with bounding boxes centered around the tip annotations with specific margin sizes to determine the optimal margin size for detecting the tip of the instrument and localizing the point. We calculated the recall, precision, and F1-score with a fixed box size for both ground truth tip coordinates and predicted midpoints to compare the performance of the models trained with different margin size bounding boxes.
For RetinaNet, a margin size of 150 pixels was optimal with a recall of 1.000, precision of 0.733, and F1-score of 0.846. For YOLOv2, a margin size of 150 pixels was optimal with a recall of 0.864, precision of 0.808, F1-score of 0.835. Also, the optimal margin size of 150 pixels of RetinaNet was used to cross-validate its overall robustness. The resulting mean recall, precision, and F1-score were 1.000 ± 0.000, 0.767 ± 0.033, and 0.868 ± 0.022, respectively.
In this study, we evaluated an automatic tip detection method for surgical instruments in endoscopic surgery, compared two state-of-the-art detection algorithms, RetinaNet and YOLOv2, and validated the robustness with cross-validation. This method can be applied in different types of endoscopy tip detection.
机器人技术和深度学习的最新进展可用于内窥镜手术,并通过解放外科医生的一只手提供许多优势。本研究旨在自动检测器械尖端,定位一个点,并评估双端口内窥镜脊柱手术(BESS)中的检测精度。尖端检测可以作为机器人内窥镜视觉智能开发的初步研究。
该数据集包含 9 个 BESS 视频中的 2310 帧,其尖端的 x 和 y 坐标由专家进行注释。我们使用边界框集中在尖端注释周围,并使用特定的边距大小来训练两个最先进的检测模型,RetinaNet 和 YOLOv2,以确定检测器械尖端和定位点的最佳边距大小。我们使用固定框大小计算了地面实况尖端坐标和预测中点的召回率、精度和 F1 分数,以比较使用不同边距大小边界框训练的模型的性能。
对于 RetinaNet,边距大小为 150 像素时最优,召回率为 1.000,精度为 0.733,F1 得分为 0.846。对于 YOLOv2,边距大小为 150 像素时最优,召回率为 0.864,精度为 0.808,F1 得分为 0.835。此外,还使用 RetinaNet 的最佳边距 150 像素来交叉验证其整体鲁棒性。结果的平均召回率、精度和 F1 得分为 1.000±0.000、0.767±0.033 和 0.868±0.022。
在这项研究中,我们评估了一种用于内窥镜手术中手术器械的自动尖端检测方法,比较了两种最先进的检测算法,RetinaNet 和 YOLOv2,并通过交叉验证验证了其鲁棒性。该方法可应用于不同类型的内窥镜尖端检测。