IEEE Trans Med Imaging. 2019 May;38(5):1197-1206. doi: 10.1109/TMI.2018.2881415. Epub 2018 Nov 14.
Medical datasets are often highly imbalanced with over-representation of prevalent conditions and poor representation of rare medical conditions. Due to privacy concerns, it is challenging to aggregate large datasets between health care institutions. We propose synthesizing pathology in medical images as a means to overcome these challenges. We implement a deep convolutional generative adversarial network (DCGAN) to create synthesized chest X-rays based upon a modest sized labeled dataset. We used a combination of real and synthesized images to train deep convolutional neural networks (DCNNs) to detect pathology across five classes of chest X-rays. The comparative study of DCNNs trained with the combination of real and synthesized images showed that these networks can outperform similar networks trained solely with real images in pathology classification. This improved performance is largely attributable to the balancing of the dataset using DCGAN synthesized images, where classes that are lacking in example images are preferentially augmented.
医学数据集通常存在严重的不平衡问题,常见病症的数据过多,而罕见病症的数据则不足。由于隐私问题,医疗机构之间很难聚合大型数据集。我们提出通过合成医学图像中的病理学来克服这些挑战。我们实现了一个深度卷积生成对抗网络(DCGAN),基于一个规模适中的标记数据集来创建合成的胸部 X 光片。我们使用真实图像和合成图像的组合来训练深度卷积神经网络(DCNN),以检测五种类型的胸部 X 光片中的病理学。对使用真实图像和合成图像的组合训练的 DCNN 的比较研究表明,这些网络在病理学分类方面的性能优于仅使用真实图像训练的类似网络。这种性能的提高主要归因于使用 DCGAN 合成图像平衡数据集,其中缺乏示例图像的类别被优先扩充。