Division of Radiotherapy and Imaging, the Institute of Cancer, London, SM2 5NG, UK.
AI for Healthcare Centre for Doctoral Training, Imperial College London, Exhibition Road, London, SW7 2BX, UK.
Sci Rep. 2023 Jun 29;13(1):10568. doi: 10.1038/s41598-023-36712-1.
Handcrafted and deep learning (DL) radiomics are popular techniques used to develop computed tomography (CT) imaging-based artificial intelligence models for COVID-19 research. However, contrast heterogeneity from real-world datasets may impair model performance. Contrast-homogenous datasets present a potential solution. We developed a 3D patch-based cycle-consistent generative adversarial network (cycle-GAN) to synthesize non-contrast images from contrast CTs, as a data homogenization tool. We used a multi-centre dataset of 2078 scans from 1,650 patients with COVID-19. Few studies have previously evaluated GAN-generated images with handcrafted radiomics, DL and human assessment tasks. We evaluated the performance of our cycle-GAN with these three approaches. In a modified Turing-test, human experts identified synthetic vs acquired images, with a false positive rate of 67% and Fleiss' Kappa 0.06, attesting to the photorealism of the synthetic images. However, on testing performance of machine learning classifiers with radiomic features, performance decreased with use of synthetic images. Marked percentage difference was noted in feature values between pre- and post-GAN non-contrast images. With DL classification, deterioration in performance was observed with synthetic images. Our results show that whilst GANs can produce images sufficient to pass human assessment, caution is advised before GAN-synthesized images are used in medical imaging applications.
手工制作和深度学习(DL)放射组学是用于开发基于计算机断层扫描(CT)成像的人工智能模型以进行 COVID-19 研究的流行技术。然而,来自真实世界数据集的对比度异质性可能会影响模型性能。对比度均匀的数据集提供了一种潜在的解决方案。我们开发了一种基于 3D 补丁的循环一致生成对抗网络(cycle-GAN),以从对比度 CT 合成非对比图像,作为数据均匀化工具。我们使用了来自 1650 名 COVID-19 患者的 2078 次扫描的多中心数据集。以前很少有研究使用手工放射组学、DL 和人类评估任务评估 GAN 生成的图像。我们使用这三种方法评估了我们的 cycle-GAN 的性能。在修改后的图灵测试中,人类专家识别出了合成与采集图像,假阳性率为 67%,Fleiss' Kappa 为 0.06,证明了合成图像的逼真度。然而,在使用放射组学特征对机器学习分类器进行测试时,使用合成图像会降低性能。在预处理和后处理非对比图像之间,特征值的差异明显。在使用 DL 分类时,使用合成图像会观察到性能下降。我们的结果表明,虽然 GAN 可以生成足以通过人类评估的图像,但在将 GAN 合成的图像用于医学成像应用之前,应谨慎行事。