Zunair Hasib, Hamza A Ben
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC Canada.
Soc Netw Anal Min. 2021;11(1):23. doi: 10.1007/s13278-021-00731-5. Epub 2021 Feb 24.
Motivated by the lack of publicly available datasets of chest radiographs of positive patients with coronavirus disease 2019 (COVID-19), we build the first-of-its-kind open dataset of synthetic COVID-19 chest X-ray images of high fidelity using an unsupervised domain adaptation approach by leveraging class conditioning and adversarial training. Our contributions are twofold. First, we show considerable performance improvements on COVID-19 detection using various deep learning architectures when employing synthetic images as additional training set. Second, we show how our image synthesis method can serve as a data anonymization tool by achieving comparable detection performance when trained only on synthetic data. In addition, the proposed data generation framework offers a viable solution to the COVID-19 detection in particular, and to medical image classification tasks in general. Our publicly available benchmark dataset (https://github.com/hasibzunair/synthetic-covid-cxr-dataset.) consists of 21,295 synthetic COVID-19 chest X-ray images. The insights gleaned from this dataset can be used for preventive actions in the fight against the COVID-19 pandemic.
由于缺乏公开可用的2019冠状病毒病(COVID-19)阳性患者胸部X光片数据集,我们利用类条件和对抗训练,采用无监督域适应方法,构建了首个高保真合成COVID-19胸部X光图像开放数据集。我们的贡献有两方面。第一,当使用合成图像作为额外训练集时,我们展示了在使用各种深度学习架构进行COVID-19检测时性能有显著提升。第二,我们展示了我们的图像合成方法如何通过仅在合成数据上训练时实现可比的检测性能,从而作为一种数据匿名化工具。此外,所提出的数据生成框架尤其为COVID-19检测以及一般医学图像分类任务提供了一个可行的解决方案。我们公开可用的基准数据集(https://github.com/hasibzunair/synthetic-covid-cxr-dataset.)由21,295张合成COVID-19胸部X光图像组成。从该数据集中获得的见解可用于抗击COVID-19大流行的预防行动。