The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy; Department of Excellence in Robotics & AI, Scuola Superiore SantâĂŹAnna, 56127 Pisa, Italy. Electronic address: https://www.santannapisa.it/it/marta-gherardini.
Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, U.K; School of Electronic and Electrical Engineering, University of Leeds, Leeds, U.K.
Comput Methods Programs Biomed. 2020 Aug;192:105420. doi: 10.1016/j.cmpb.2020.105420. Epub 2020 Feb 29.
Background and objectivesAutomated segmentation and tracking of surgical instruments and catheters under X-ray fluoroscopy hold the potential for enhanced image guidance in catheter-based endovascular procedures. This article presents a novel method for real-time segmentation of catheters and guidewires in 2d X-ray images. We employ Convolutional Neural Networks (CNNs) and propose a transfer learning approach, using synthetic fluoroscopic images, to develop a lightweight version of the U-Net architecture. Our strategy, requiring a small amount of manually annotated data, streamlines the training process and results in a U-Net model, which achieves comparable performance to the state-of-the-art segmentation, with a decreased number of trainable parameters. MethodsThe proposed transfer learning approach exploits high-fidelity synthetic images generated from real fluroscopic backgrounds. We implement a two-stage process, initial end-to-end training and fine-tuning, to develop two versions of our model, using synthetic and phantom fluoroscopic images independently. A small number of manually annotated in-vivo images is employed to fine-tune the deepest 7 layers of the U-Net architecture, producing a network specialized for pixel-wise catheter/guidewire segmentation. The network takes as input a single grayscale image and outputs the segmentation result as a binary mask against the background. ResultsEvaluation is carried out with images from in-vivo fluoroscopic video sequences from six endovascular procedures, with different surgical setups. We validate the effectiveness of developing the U-Net models using synthetic data, in tests where fine-tuning and testing in-vivo takes place both by dividing data from all procedures into independent fine-tuning/testing subsets as well as by using different in-vivo sequences. Accurate catheter/guidewire segmentation (average Dice coefficient of ~ 0.55, ~ 0.26 and ~ 0.17) is obtained with both U-Net models. Compared to the state-of-the-art CNN models, the proposed U-Net achieves comparable performance ( ± 5% average Dice coefficients) in terms of segmentation accuracy, while yielding a 84% reduction of the testing time. This adds flexibility for real-time operation and makes our network adaptable to increased input resolution. ConclusionsThis work presents a new approach in the development of CNN models for pixel-wise segmentation of surgical catheters in X-ray fluoroscopy, exploiting synthetic images and transfer learning. Our methodology reduces the need for manually annotating large volumes of data for training. This represents an important advantage, given that manual pixel-wise annotations is a key bottleneck in developing CNN segmentation models. Combined with a simplified U-Net model, our work yields significant advantages compared to current state-of-the-art solutions.
在 X 射线透视下对手术器械和导管进行自动分割和跟踪,有望提高基于导管的血管内手术的图像引导能力。本文提出了一种新的方法,用于实时分割二维 X 射线图像中的导管和导丝。我们使用卷积神经网络(CNN)并提出了一种迁移学习方法,使用合成透视图像来开发 U-Net 架构的轻量级版本。我们的策略需要少量手动注释的数据,简化了训练过程,并得到了一个与最先进的分割方法性能相当的 U-Net 模型,但其可训练参数的数量减少了。
所提出的迁移学习方法利用从真实透视背景生成的高保真合成图像。我们实施了两阶段的过程,即端到端的初始训练和微调,以分别使用合成和体模透视图像来开发我们的模型的两个版本。使用少量手动注释的体内图像来微调 U-Net 架构的最深 7 层,生成专门用于像素级导管/导丝分割的网络。该网络以单个灰度图像作为输入,并将分割结果输出为与背景相对的二进制掩模。
使用来自六个血管内手术的体内透视视频序列的图像进行评估,这些手术具有不同的手术设置。我们通过将所有手术的图像数据分为独立的微调/测试子集,以及使用不同的体内序列,验证了使用合成数据开发 U-Net 模型的有效性。使用这两种 U-Net 模型都可以获得准确的导管/导丝分割(平均 Dice 系数约为 0.55、0.26 和 0.17)。与最先进的 CNN 模型相比,所提出的 U-Net 在分割精度方面具有相当的性能(平均 Dice 系数相差±5%),同时测试时间减少了 84%。这为实时操作增加了灵活性,并使我们的网络能够适应更高的输入分辨率。
本文提出了一种新的方法,用于开发基于 X 射线透视的手术导管像素级分割的 CNN 模型,该方法利用合成图像和迁移学习。我们的方法减少了对大量训练数据进行手动注释的需求。考虑到手动像素级注释是开发 CNN 分割模型的一个关键瓶颈,这是一个重要的优势。与当前最先进的解决方案相比,我们的方法结合简化的 U-Net 模型具有显著的优势。