Xue Hui, Hooper Sarah M, Davies Rhodri H, Treibel Thomas A, Pierce Iain, Stairs John, Naegele Joseph, Manisty Charlotte, Moon James C, Campbell-Washburn Adrienne E, Kellman Peter, Hansen Michael S
Microsoft Research, Health Futures, Redmond, WA, USA.
National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
ArXiv. 2025 Apr 13:arXiv:2504.10534v1.
To propose a flexible and scalable imaging transformer (IT) architecture with three attention modules for multi-dimensional imaging data and apply it to MRI denoising with very low input SNR.
Three independent attention modules were developed: spatial local, spatial global, and frame attentions. They capture long-range signal correlation and bring back the locality of information in images. An attention-cell-block design processes 5D tensors ([B, C, F, H, W]) for 2D, 2D+T, and 3D image data. A High Resolution (HRNet) backbone was built to hold IT blocks. Training dataset consists of 206,677 cine series and test datasets had 7,267 series. Ten input SNR levels from 0.05 to 8.0 were tested. IT models were compared to seven convolutional and transformer baselines. To test scalability, four IT models 27m to 218m parameters were trained. Two senior cardiologists reviewed IT model outputs from which the EF was measured and compared against the ground-truth.
IT models significantly outperformed other models over the tested SNR levels. The performance gap was most prominent at low SNR levels. The IT-218m model had the highest SSIM and PSNR, restoring good image quality and anatomical details even at SNR 0.2. Two experts agreed at this SNR or above, the IT model output gave the same clinical interpretation as the ground-truth. The model produced images that had accurate EF measurements compared to ground-truth values.
Imaging transformer model offers strong performance, scalability, and versatility for MR denoising. It recovers image quality suitable for confident clinical reading and accurate EF measurement, even at very low input SNR of 0.2.
提出一种具有三个注意力模块的灵活且可扩展的成像变压器(IT)架构,用于多维成像数据,并将其应用于极低输入信噪比的磁共振成像(MRI)去噪。
开发了三个独立的注意力模块:空间局部注意力、空间全局注意力和帧注意力。它们捕获长距离信号相关性,并恢复图像中信息的局部性。注意力单元块设计用于处理二维、二维+时间和三维图像数据的5D张量([B, C, F, H, W])。构建了一个高分辨率(HRNet)主干来容纳IT块。训练数据集由206,677个电影序列组成,测试数据集有7,267个序列。测试了从0.05到8.0的十个输入信噪比水平。将IT模型与七个卷积和变压器基线进行了比较。为了测试可扩展性,训练了四个参数从27m到218m的IT模型。两位资深心脏病专家审查了IT模型的输出,从中测量射血分数(EF)并与真实值进行比较。
在测试的信噪比水平上,IT模型显著优于其他模型。在低信噪比水平下,性能差距最为显著。IT-218m模型具有最高的结构相似性指数(SSIM)和峰值信噪比(PSNR),即使在信噪比为0.2时也能恢复良好的图像质量和解剖细节。两位专家一致认为,在这个信噪比或更高水平下,IT模型的输出与真实值给出了相同的临床解释。与真实值相比,该模型生成的图像具有准确的EF测量值。
成像变压器模型在MRI去噪方面具有强大的性能、可扩展性和通用性。即使在输入信噪比低至0.2的情况下,它也能恢复适合可靠临床解读和准确EF测量的图像质量。