基于 Vision Transformer 从单张二维胸部 X 光图像重建三维肺部表面

XRayWizard: Reconstructing 3-D lung surfaces from a single 2-D chest x-ray image via Vision Transformer.

作者信息

Shi Zhiyi, Geng Kaiwen, Zhao Xiaoyan, Mahmoudi Farhad, Haas Christopher J, Leader Joseph K, Duman Emrah, Pu Jiantao

机构信息

Department of Radiology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

出版信息

Med Phys. 2024 Apr;51(4):2806-2816. doi: 10.1002/mp.16781. Epub 2023 Oct 11.

DOI:10.1002/mp.16781

PMID:37819009

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12923332/

Abstract

BACKGROUND

Chest x-ray is widely utilized for the evaluation of pulmonary conditions due to its technical simplicity, cost-effectiveness, and portability. However, as a two-dimensional (2-D) imaging modality, chest x-ray images depict limited anatomical details and are challenging to interpret.

PURPOSE

To validate the feasibility of reconstructing three-dimensional (3-D) lungs from a single 2-D chest x-ray image via Vision Transformer (ViT).

METHODS

We created a cohort of 2525 paired chest x-ray images (scout images) and computed tomography (CT) acquired on different subjects and we randomly partitioned them as follows: (1) 1800 - training set, (2) 200 - validation set, and (3) 525 - testing set. The 3-D lung volumes segmented from the chest CT scans were used as the ground truth for supervised learning. We developed a novel model termed XRayWizard that employed ViT blocks to encode the 2-D chest x-ray image. The aim is to capture global information and establish long-range relationships, thereby improving the performance of 3-D reconstruction. Additionally, a pooling layer at the end of each transformer block was introduced to extract feature information. To produce smoother and more realistic 3-D models, a set of patch discriminators was incorporated. We also devised a novel method to incorporate subject demographics as an auxiliary input to further improve the accuracy of 3-D lung reconstruction. Dice coefficient and mean volume error were used as performance metrics as the agreement between the computerized results and the ground truth.

RESULTS

In the absence of subject demographics, the mean Dice coefficient for the generated 3-D lung volumes achieved a value of 0.738 ± 0.091. When subject demographics were included as an auxiliary input, the mean Dice coefficient significantly improved to 0.769 ± 0.089 (p < 0.001), and the volume prediction error was reduced from 23.5 ± 2.7%. to 15.7 ± 2.9%.

CONCLUSION

Our experiment demonstrated the feasibility of reconstructing 3-D lung volumes from 2-D chest x-ray images, and the inclusion of subject demographics as additional inputs can significantly improve the accuracy of 3-D lung volume reconstruction.

摘要

背景

由于技术简单、经济有效且便携，胸部 X 射线被广泛用于评估肺部状况。然而，作为二维（2-D）成像方式，胸部 X 射线图像只能描绘有限的解剖细节，且难以解读。

目的

通过 Vision Transformer（ViT）验证从单个二维胸部 X 射线图像重建三维（3-D）肺部的可行性。

方法

我们创建了一个由 2525 对胸部 X 射线图像（透视图像）和在不同受试者上采集的计算机断层扫描（CT）组成的队列，并将其随机分为以下三组：（1）1800-训练集，（2）200-验证集，和（3）525-测试集。从胸部 CT 扫描中分割出的 3-D 肺体积用作监督学习的ground truth。我们开发了一种名为 XRayWizard 的新型模型，该模型采用 Vision Transformer 块来编码二维胸部 X 射线图像。其目的是捕获全局信息并建立远程关系，从而提高 3-D 重建的性能。此外，在每个 transformer 块的末尾引入了一个池化层来提取特征信息。为了生成更平滑和更真实的 3-D 模型，引入了一组补丁鉴别器。我们还设计了一种新方法，将受试者的人口统计学数据作为辅助输入，以进一步提高 3-D 肺重建的准确性。Dice 系数和平均体积误差被用作性能指标，以衡量计算机结果与 ground truth 的一致性。