Muramatsu Chisako, Nishio Mizuho, Goto Takuma, Oiwa Mikinao, Morita Takako, Yakami Masahiro, Kubo Takeshi, Togashi Kaori, Fujita Hiroshi
Faculty of Data Science, Shiga University, 1-1-1 Banba, Hikone, Shiga, 522-8522, Japan.
Preemptive Medicine and Lifestyle-Related Disease Research Center, Kyoto University Hospital, 53 Shogoin Kawaharacho, Sakyo-ku, Kyoto, 606-8507, Japan.
Comput Biol Med. 2020 Apr;119:103698. doi: 10.1016/j.compbiomed.2020.103698. Epub 2020 Mar 10.
Training of a convolutional neural network (CNN) generally requires a large dataset. However, it is not easy to collect a large medical image dataset. The purpose of this study is to investigate the utility of synthetic images in training CNNs and to demonstrate the applicability of unrelated images by domain transformation. Mammograms showing 202 benign and 212 malignant masses were used for evaluation. To create synthetic data, a cycle generative adversarial network was trained with 599 lung nodules in computed tomography (CT) and 1430 breast masses on digitized mammograms (DDSM). A CNN was trained for classification between benign and malignant masses. The classification performance was compared between the networks trained with the original data, augmented data, synthetic data, DDSM images, and natural images (ImageNet dataset). The results were evaluated in terms of the classification accuracy and the area under the receiver operating characteristic curves (AUC). The classification accuracy improved from 65.7% to 67.1% with data augmentation. The use of an ImageNet pretrained model was useful (79.2%). Performance was slightly improved when synthetic images or the DDSM images only were used for pretraining (67.6 and 72.5%, respectively). When the ImageNet pretrained model was trained with the synthetic images, the classification performance slightly improved (81.4%), although the difference in AUCs was not statistically significant. The use of the synthetic images had an effect similar to the DDSM images. The results of the proposed study indicated that the synthetic data generated from unrelated lesions by domain transformation could be used to increase the training samples.
卷积神经网络(CNN)的训练通常需要大量数据集。然而,收集大量医学图像数据集并非易事。本研究的目的是探讨合成图像在训练CNN中的效用,并通过域变换证明不相关图像的适用性。使用显示202个良性和212个恶性肿块的乳房X光照片进行评估。为了创建合成数据,使用计算机断层扫描(CT)中的599个肺结节和数字化乳房X光照片(DDSM)上的1430个乳腺肿块训练了一个循环生成对抗网络。训练一个CNN对良性和恶性肿块进行分类。比较了使用原始数据、增强数据、合成数据、DDSM图像和自然图像(ImageNet数据集)训练的网络之间的分类性能。根据分类准确率和接收器操作特征曲线(AUC)下的面积对结果进行评估。通过数据增强,分类准确率从65.7%提高到67.1%。使用ImageNet预训练模型很有用(79.2%)。仅使用合成图像或DDSM图像进行预训练时,性能略有提高(分别为67.6%和72.5%)。当使用合成图像对ImageNet预训练模型进行训练时,分类性能略有提高(81.4%),尽管AUCs的差异无统计学意义。合成图像的使用效果与DDSM图像相似。本研究结果表明,通过域变换从不相关病变生成的合成数据可用于增加训练样本。