Chang Zhixing, Shang Jiawen, Fan Yuhan, Huang Peng, Hu Zhihui, Zhang Ke, Dai Jianrong, Yan Hui
Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
Quant Imaging Med Surg. 2025 Sep 1;15(9):8611-8626. doi: 10.21037/qims-2024-2962. Epub 2025 Aug 13.
Cone-beam computed tomography (CBCT) is a three-dimensional (3D) imaging method designed for routine target verification of cancer patients during radiotherapy. The images are reconstructed from a sequence of projection images obtained by the on-board imager attached to a radiotherapy machine. CBCT images are usually stored in a health information system, but the projection images are mostly abandoned due to their massive volume. To store them economically, in this study, a deep learning (DL)-based super-resolution (SR) method for compressing the projection images was investigated.
In image compression, low-resolution (LR) images were down-sampled by a factor from the high-resolution (HR) projection images and then encoded to the video file. In image restoration, LR images were decoded from the video file and then up-sampled to HR projection images via the DL network. Three SR DL networks, convolutional neural network (CNN), residual network (ResNet), and generative adversarial network (GAN), were tested along with three video coding-decoding (CODEC) algorithms: Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), and AOMedia Video 1 (AV1). Based on the two databases of the natural and projection images, the performance of the SR networks and video codecs was evaluated with the compression ratio (CR), peak signal-to-noise ratio (PSNR), video quality metric (VQM), and structural similarity index measure (SSIM).
The codec AV1 achieved the highest CR among the three codecs. The CRs of AV1 were 13.91, 42.08, 144.32, and 289.80 for the down-sampling factor (DSF) 0 (non-SR) 2, 4, and 6, respectively. The SR network, ResNet, achieved the best restoration accuracy among the three SR networks. Its PSNRs were 69.08, 41.60, 37.08, and 32.44 dB for the four DSFs, respectively; its VQMs were 0.06%, 3.65%, 6.95%, and 13.03% for the four DSFs, respectively; and its SSIMs were 0.9984, 0.9878, 0.9798, and 0.9518 for the four DSFs, respectively. As the DSF increased, the CR increased proportionally with the modest degradation of the restored images.
The application of the SR model can further improve the CR based on the current result achieved by the video encoders. This compression method is not only effective for the two-dimensional (2D) projection images, but also applicable to the 3D images used in radiotherapy.
锥形束计算机断层扫描(CBCT)是一种三维(3D)成像方法,旨在用于癌症患者放疗期间的常规靶区验证。图像由连接到放疗机的机载成像仪获取的一系列投影图像重建而成。CBCT图像通常存储在健康信息系统中,但由于投影图像数量巨大,大多被废弃。为了经济地存储它们,本研究探讨了一种基于深度学习(DL)的超分辨率(SR)方法来压缩投影图像。
在图像压缩中,低分辨率(LR)图像通过从高分辨率(HR)投影图像中按一定因子下采样得到,然后编码到视频文件中。在图像恢复中,LR图像从视频文件中解码,然后通过DL网络上采样到HR投影图像。测试了三种SR DL网络,即卷积神经网络(CNN)、残差网络(ResNet)和生成对抗网络(GAN),以及三种视频编解码(CODEC)算法:高级视频编码(AVC)、高效视频编码(HEVC)和AOMedia视频1(AV1)。基于自然图像和投影图像两个数据库,使用压缩率(CR)、峰值信噪比(PSNR)、视频质量指标(VQM)和结构相似性指数测量(SSIM)对SR网络和视频编解码器的性能进行了评估。
编解码器AV1在三种编解码器中实现了最高的CR。对于下采样因子(DSF)0(非SR)、2、4和6,AV1的CR分别为13.91、42.08、144.32和289.80。SR网络ResNet在三种SR网络中实现了最佳的恢复精度。对于四个DSF,其PSNR分别为69.08、41.60、37.08和32.44 dB;其VQM分别为0.06%、3.65%、6.95%和13.03%;其SSIM分别为0.9984、0.9878、0.9798和0.9518。随着DSF的增加,CR成比例增加,同时恢复图像质量略有下降。
基于视频编码器目前取得的结果,SR模型的应用可以进一步提高CR。这种压缩方法不仅对二维(2D)投影图像有效,也适用于放疗中使用的3D图像。