Chi Jianning, Miao Jian, Chen Jia-Hui, Wang Huan, Yu Xiaosheng, Huang Ying
Faculty of Robot Science and Engineering, Northeastern University, Zhihui Street, Shenyang, 110169, Liaoning, China.
Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Zhihui Street, Shenyang, 110169, Liaoning, China.
J Imaging Inform Med. 2024 Dec;37(6):3264-3281. doi: 10.1007/s10278-023-00935-5. Epub 2024 Jun 5.
Thyroid ultrasound video provides significant value for thyroid diseases diagnosis, but the ultrasound imaging process is often affected by the speckle noise, resulting in poor quality of the ultrasound video. Numerous video denoising methods have been proposed to remove noise while preserving texture details. However, existing methods still suffer from the following problems: (1) relevant temporal features in the low-contrast ultrasound video cannot be accurately aligned and effectively aggregated by simple optical flow or motion estimation, resulting in the artifacts and motion blur in the video; (2) fixed receptive field in spatial features integration lacks the flexibility of aggregating features in the global region of interest and is susceptible to interference from irrelevant noisy regions. In this work, we propose a deformable spatial-temporal attention denoising network to remove speckle noise in thyroid ultrasound video. The entire network follows the bidirectional feature propagation mechanism to efficiently exploit the spatial-temporal information of the whole video sequence. In this process, two modules are proposed to address the above problems: (1) a deformable temporal attention module (DTAM) is designed after optical flow pre-alignment to further capture and aggregate relevant temporal features according to the learned offsets between frames, so that inter-frame information can be better exploited even with the imprecise flow estimation under the low contrast of ultrasound video; (2) a deformable spatial attention module (DSAM) is proposed to flexibly integrate spatial features in the global region of interest through the learned intra-frame offsets, so that irrelevant noisy information can be ignored and essential information can be precisely exploited. Finally, all these refined features are rectified and merged through residual convolution blocks to recover the clean video frames. Experimental results on our thyroid ultrasound video (US-V) dataset and the DDTI dataset demonstrate that our proposed method exceeds 1.2 1.3 dB on PSNR and has clearer texture detail compared to other state-of-the-art methods. In the meantime, the proposed model can also assist thyroid nodule segmentation methods to achieve more accurate segmentation effect, which provides an important basis for thyroid diagnosis. In the future, the proposed model can be improved and extended to other medical image sequence datasets, including CT and MRI slice denoising. The code and datasets are provided at https://github.com/Meta-MJ/DSTAN .
甲状腺超声视频对甲状腺疾病的诊断具有重要价值,但超声成像过程常受斑点噪声影响,导致超声视频质量不佳。人们已提出众多视频去噪方法,旨在去除噪声的同时保留纹理细节。然而,现有方法仍存在以下问题:(1)低对比度超声视频中的相关时间特征无法通过简单的光流或运动估计准确对齐并有效聚合,导致视频中出现伪影和运动模糊;(2)空间特征整合中的固定感受野缺乏在全局感兴趣区域聚合特征的灵活性,且易受无关噪声区域的干扰。在这项工作中,我们提出了一种可变形的时空注意力去噪网络,用于去除甲状腺超声视频中的斑点噪声。整个网络遵循双向特征传播机制,以有效利用整个视频序列的时空信息。在此过程中,提出了两个模块来解决上述问题:(1)在光流预对齐后设计了一个可变形时间注意力模块(DTAM),根据帧间学习到的偏移进一步捕捉和聚合相关时间特征,从而即使在超声视频低对比度下光流估计不准确的情况下,也能更好地利用帧间信息;(2)提出了一个可变形空间注意力模块(DSAM),通过学习到的帧内偏移灵活地在全局感兴趣区域整合空间特征,从而可以忽略无关的噪声信息并精确利用关键信息。最后,所有这些细化后的特征通过残差卷积块进行校正和合并,以恢复干净的视频帧。在我们的甲状腺超声视频(US-V)数据集和DDTI数据集上的实验结果表明,我们提出的方法在PSNR上超过了1.2至1.3dB,并且与其他现有最先进方法相比,具有更清晰的纹理细节。同时,所提出的模型还可以辅助甲状腺结节分割方法实现更准确的分割效果,这为甲状腺诊断提供了重要依据。未来,所提出的模型可以改进并扩展到其他医学图像序列数据集,包括CT和MRI切片去噪。代码和数据集可在https://github.com/Meta-MJ/DSTAN获取。