Multimedia Systems Department, Faculty of Electronics, Telecommunications, and Informatics, Gdansk University of Technology, ul. Narutowicza 11/12, Gdansk 80-233, Poland; Systems Research Institute of the Polish Academy of Sciences, ul. Newelska 6, Warsaw 01-447, Poland.
Systems Research Institute of the Polish Academy of Sciences, ul. Newelska 6, Warsaw 01-447, Poland; Biomedical Engineering Department, Faculty of Electronics, Telecommunications, and Informatics, Gdansk University of Technology, ul. Narutowicza 11/12, Gdansk 80-233, Poland".
Med Image Anal. 2018 May;46:244-265. doi: 10.1016/j.media.2018.03.012. Epub 2018 Mar 30.
Localizing instrument parts in video-assisted surgeries is an attractive and open computer vision problem. A working algorithm would immediately find applications in computer-aided interventions in the operating theater. Knowing the location of tool parts could help virtually augment visual faculty of surgeons, assess skills of novice surgeons, and increase autonomy of surgical robots. A surgical tool varies in appearance due to articulation, viewpoint changes, and noise. We introduce a new method for detection and pose estimation of multiple non-rigid and robotic tools in surgical videos. The method uses a rigidly structured, bipartite model of end-effector and shaft parts that consistently encode diverse, pose-specific appearance mixtures of the tool. This rigid part mixtures model then jointly explains the evolving tool structure by switching between mixture components. Rigidly capturing end-effector appearance allows explicit transfer of keypoint meta-data of the detected components for full 2D pose estimation. The detector can as well delineate precise skeleton of the end-effector by transferring additional keypoints. To this end, we propose effective procedure for learning such rigid mixtures from videos and for pooling the modeled shaft part that undergoes frequent truncation at the border of the imaged scene. Notably, extensive diagnostic experiments inform that feature regularization is a key to fine-tune the model in the presence of inherent appearance bias in videos. Experiments further illustrate that estimation of end-effector pose improves upon including the shaft part in the model. We then evaluate our approach on publicly available datasets of in-vivo sequences of non-rigid tools and demonstrate state-of-the-art results.
在视频辅助手术中定位器械部件是一个具有吸引力和开放性的计算机视觉问题。一个可行的算法将立即在手术室中的计算机辅助干预中找到应用。了解工具部件的位置可以帮助虚拟增强外科医生的视觉能力,评估新手外科医生的技能,并提高手术机器人的自主性。由于关节运动、视角变化和噪声,手术工具的外观会发生变化。我们引入了一种新的方法,用于检测和估计手术视频中多个非刚性和机器人工具的位置和姿态。该方法使用刚性结构的二分模型,对末端执行器和轴部分进行编码,该模型始终对工具的各种特定于姿态的外观混合进行编码。然后,这个刚性部分混合模型通过在混合成分之间切换来共同解释不断变化的工具结构。刚性捕获末端执行器的外观允许为完整的 2D 姿态估计显式传输检测到的组件的关键点元数据。检测器还可以通过传输附加的关键点来描绘末端执行器的精确骨架。为此,我们提出了一种从视频中学习这种刚性混合物并对建模的轴部分进行池化的有效方法,该轴部分在成像场景的边界处经常被截断。值得注意的是,广泛的诊断实验表明,特征正则化是在视频中存在固有外观偏差的情况下微调模型的关键。实验进一步表明,在模型中包括轴部分可以提高末端执行器姿态的估计。然后,我们在公共可用的非刚性工具体内序列数据集上评估我们的方法,并展示了最先进的结果。