Beijing Engineering Research Center of Mixed Reality and Advanced Display, School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081 People's Republic of China. Authors contribute equally to this article.
Phys Med Biol. 2020 Aug 31;65(16):165004. doi: 10.1088/1361-6560/ab8dda.
Identification of surgical instruments is crucial in understanding surgical scenarios and providing an assistive process in endoscopic image-guided surgery. This study proposes a novel multilevel feature-aggregated deep convolutional neural network (MLFA-Net) for identifying surgical instruments in endoscopic images. First, a global feature augmentation layer is created on the top layer of the backbone to improve the localization ability of object identification by boosting the high-level semantic information to the feature flow network. Second, a modified interaction path of cross-channel features is proposed to increase the nonlinear combination of features in the same level and improve the efficiency of information propagation. Third, a multiview fusion branch of features is built to aggregate the location-sensitive information of the same level in different views, increase the information diversity of features, and enhance the localization ability of objects. By utilizing the latent information, the proposed network of multilevel feature aggregation can accomplish multitask instrument identification with a single network. Three tasks are handled by the proposed network, including object detection, which classifies the type of instrument and locates its border; mask segmentation, which detects the instrument shape; and pose estimation, which detects the keypoint of instrument parts. The experiments are performed on laparoscopic images from MICCAI 2017 Endoscopic Vision Challenge, and the mean average precision (AP) and average recall (AR) are utilized to quantify the segmentation and pose estimation results. For the bounding box regression, the AP and AR are 79.1% and 63.2%, respectively, while the AP and AR of mask segmentation are 78.1% and 62.1%, and the AP and AR of the pose estimation achieve 67.1% and 55.7%, respectively. The experiments demonstrate that our method efficiently improves the recognition accuracy of the instrument in endoscopic images, and outperforms the other state-of-the-art methods.
在理解手术场景和为内镜图像引导手术提供辅助过程中,识别手术器械至关重要。本研究提出了一种新的多级特征聚合深度卷积神经网络(MLFA-Net),用于识别内镜图像中的手术器械。首先,在骨干网络的顶层创建一个全局特征增强层,通过将高层语义信息提升到特征流网络,提高目标识别的定位能力。其次,提出了一种改进的交叉通道特征交互路径,以增加同层特征的非线性组合,提高信息传播效率。第三,构建了一个多视图特征融合分支,聚合不同视图中同层的位置敏感信息,增加特征的信息多样性,增强目标的定位能力。通过利用潜在信息,提出的多级特征聚合网络可以利用单个网络完成多任务器械识别。所提出的网络处理三个任务,包括对象检测,用于分类器械类型并定位其边界;掩模分割,用于检测器械形状;以及姿态估计,用于检测器械部分的关键点。实验在 MICCAI 2017 内镜视觉挑战赛上的腹腔镜图像上进行,使用平均精度(AP)和平均召回率(AR)来量化分割和姿态估计结果。对于边界框回归,AP 和 AR 分别为 79.1%和 63.2%,而掩模分割的 AP 和 AR 分别为 78.1%和 62.1%,姿态估计的 AP 和 AR 分别为 67.1%和 55.7%。实验表明,我们的方法有效地提高了内镜图像中器械的识别精度,优于其他最新方法。