通过变分自编码器潜在空间映射实现多模态医学图像到图像的转换。

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

作者信息

Liang Zhiwen, Cheng Mengjie, Ma Jinhui, Hu Ying, Li Song, Tian Xin

机构信息

Electronic Information School, Wuhan University, Wuhan, China.

Cancer Center, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.

出版信息

Med Phys. 2025 Jul;52(7):e17912. doi: 10.1002/mp.17912. Epub 2025 May 29.

BACKGROUND

Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice.

PURPOSE

To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands.

METHODS

We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value.

RESULTS

The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images.

CONCLUSIONS

The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.

背景

医学图像翻译已成为现代放射治疗中的一项重要工具，为靶区勾画和剂量计算提供补充信息。然而，当前方法受其模态特异性的限制，需要为每对成像模态单独进行模型训练。这一限制阻碍了全面多模态解决方案在临床实践中的有效应用。

目的

开发一种使用变分自编码器（VAE）潜在空间映射的统一图像翻译方法，以实现不同医学成像模态之间的灵活转换，满足临床需求。

方法

我们提出一种三阶段方法来构建统一图像翻译模型。首先，训练一个VAE以学习各种医学图像的共享潜在空间。随后，在图像模态的指导下，利用堆叠双向变压器学习潜在空间内不同模态之间的映射。最后，对VAE解码器进行微调以提高图像质量。我们的内部数据集收集了87例头颈部病例的配对成像数据，每个病例包含锥束计算机断层扫描（CBCT）、计算机断层扫描（CT）、MR T1c和MR T2W图像。通过平均绝对误差（MAE）､峰值信噪比（PSNR）和结构相似性指数（SSIM）在我们的内部数据集和一个公共数据集上对该策略的有效性进行定量评估。此外，评估合成CT图像的剂量学特征，并对合成MR图像进行主观质量评估以确定其临床价值。

结果

采用Kullback-Leibler（KL）-16图像分词器的VAE表现出卓越的图像重建能力，Fréchet初始距离（FID）为4.84，PSNR为32.80 dB，SSIM为92.33%。在合成CT任务中，该模型在模态内翻译比跨模态翻译中显示出更高的准确性，CBCT到CT任务中的MAE为21.60±8.80亨氏单位（HU），MR T1c/T2w到CT任务中的MAE为45.23±13.21 HU/47.55±13.88 HU。对于跨对比度MR翻译任务，结果非常接近，T1c到T2w翻译的平均PSNR和SSIM值分别为26.33±1.36 dB和85.21%±2.21%，T2w到T1c翻译的平均PSNR和SSIM值分别为26.03±1.67 dB和85.73%±2.66%。剂量学结果表明，对于光子调强放射治疗（IMRT）计划，合成CT的所有伽马通过率均高于99%。然而，合成MR图像的主观质量评估得分低于真实MR图像。

结论

所提出的三阶段方法成功开发了一个统一图像翻译模型，该模型可以有效地处理广泛的医学图像翻译任务。这种灵活性和有效性使其成为临床应用的宝贵工具。

Suppr
超能文献

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

Suppr超能文献

通过变分自编码器潜在空间映射实现多模态医学图像到图像的转换。

Multimodal medical image-to-image translation via variational autoencoder latent space mapping.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

Suppr
超能文献