Wang Yibo, Ye Zhichao, Wen Mingwei, Liang Huageng, Zhang Xuming
Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, No 1037, Luyou Road, Wuhan, China.
Department of Urology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, No 13, Hangkong Road, Wuhan, China.
Med Image Anal. 2024 May;94:103130. doi: 10.1016/j.media.2024.103130. Epub 2024 Mar 2.
Robot-assisted prostate biopsy is a new technology to diagnose prostate cancer, but its safety is influenced by the inability of robots to sense the tool-tissue interaction force accurately during biopsy. Recently, vision based force sensing (VFS) provides a potential solution to this issue by utilizing image sequences to infer the interaction force. However, the existing mainstream VFS methods cannot realize the accurate force sensing due to the adoption of convolutional or recurrent neural network to learn deformation from the optical images and some of these methods are not efficient especially when the recurrent convolutional operations are involved. This paper has presented a Transformer based VFS (TransVFS) method by leveraging ultrasound volume sequences acquired during prostate biopsy. The TransVFS method uses a spatio-temporal local-global Transformer to capture the local image details and the global dependency simultaneously to learn prostate deformations for force estimation. Distinctively, our method explores both the spatial and temporal attention mechanisms for image feature learning, thereby addressing the influence of the low ultrasound image resolution and the unclear prostate boundary on the accurate force estimation. Meanwhile, the two efficient local-global attention modules are introduced to reduce 4D spatio-temporal computation burden by utilizing the factorized spatio-temporal processing strategy, thereby facilitating the fast force estimation. Experiments on prostate phantom and beagle dogs show that our method significantly outperforms existing VFS methods and other spatio-temporal Transformer models. The TransVFS method surpasses the most competitive compared method ResNet3dGRU by providing the mean absolute errors of force estimation, i.e., 70.4 ± 60.0 millinewton (mN) vs 123.7 ± 95.6 mN, on the transabdominal ultrasound dataset of dogs.
机器人辅助前列腺活检是一种诊断前列腺癌的新技术,但其安全性受到机器人在活检过程中无法准确感知工具与组织相互作用力的影响。最近,基于视觉的力传感(VFS)通过利用图像序列推断相互作用力,为这个问题提供了一个潜在的解决方案。然而,现有的主流VFS方法由于采用卷积神经网络或循环神经网络从光学图像中学习变形,无法实现准确的力传感,其中一些方法效率不高,特别是涉及循环卷积操作时。本文提出了一种基于Transformer的VFS(TransVFS)方法,利用前列腺活检过程中采集的超声容积序列。TransVFS方法使用时空局部-全局Transformer同时捕捉局部图像细节和全局依赖性,以学习前列腺变形进行力估计。独特的是,我们的方法探索了空间和时间注意力机制用于图像特征学习,从而解决了低超声图像分辨率和前列腺边界不清晰对准确力估计的影响。同时,引入了两个高效的局部-全局注意力模块,通过利用因式分解的时空处理策略减轻4D时空计算负担,从而促进快速力估计。在前列腺模型和比格犬上的实验表明,我们的方法显著优于现有的VFS方法和其他时空Transformer模型。在犬类经腹超声数据集上,TransVFS方法通过提供力估计的平均绝对误差,即70.4±60.0毫牛顿(mN)对比123.7±95.6 mN,超过了最具竞争力的比较方法ResNet3dGRU。