Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Department of Pathology, GROW School for Oncology and Developmental Biology, Maastricht University Medical Center+, Maastricht, The Netherlands.
J Pathol. 2021 May;254(1):70-79. doi: 10.1002/path.5638. Epub 2021 Mar 16.
Deep learning can detect microsatellite instability (MSI) from routine histology images in colorectal cancer (CRC). However, ethical and legal barriers impede sharing of images and genetic data, hampering development of new algorithms for detection of MSI and other biomarkers. We hypothesized that histology images synthesized by conditional generative adversarial networks (CGANs) retain information about genetic alterations. To test this, we developed a 'histology CGAN' which was trained on 256 patients (training cohort 1) and 1457 patients (training cohort 2). The CGAN synthesized 10 000 synthetic MSI and non-MSI images which contained a range of tissue types and were deemed realistic by trained observers in a blinded study. Subsequently, we trained a deep learning detector of MSI on real or synthetic images and evaluated the performance of MSI detection in a held-out set of 142 patients. When trained on real images from training cohort 1, this system achieved an area under the receiver operating curve (AUROC) of 0.742 [0.681, 0.854]. Training on the larger cohort 2 only marginally improved the AUROC to 0.757 [0.707, 0.869]. Training on purely synthetic data resulted in an AUROC of 0.743 [0.658, 0.801]. Training on both real and synthetic data further increased AUROC to 0.777 [0.715, 0.821]. We conclude that synthetic histology images retain information reflecting underlying genetic alterations in colorectal cancer. Using synthetic instead of real images to train deep learning systems yields non-inferior classifiers. This approach can be used to create large shareable data sets or to augment small data sets with rare molecular features. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.
深度学习可以从结直肠癌(CRC)的常规组织学图像中检测微卫星不稳定性(MSI)。然而,伦理和法律障碍阻碍了图像和遗传数据的共享,这阻碍了用于检测 MSI 和其他生物标志物的新算法的发展。我们假设条件生成对抗网络(CGAN)合成的组织学图像保留了有关遗传改变的信息。为了验证这一点,我们开发了一种“组织学 CGAN”,该模型在 256 名患者(训练队列 1)和 1457 名患者(训练队列 2)上进行了训练。CGAN 合成了 10000 张合成 MSI 和非 MSI 图像,这些图像包含了一系列组织类型,并且在一项盲法研究中被受过训练的观察者认为是逼真的。随后,我们在真实或合成图像上训练了 MSI 的深度学习检测器,并在 142 名患者的独立数据集上评估了 MSI 检测的性能。当在训练队列 1 的真实图像上进行训练时,该系统的接收器工作特征曲线下面积(AUROC)为 0.742[0.681,0.854]。在更大的队列 2 上进行训练仅略微提高了 AUROC,达到 0.757[0.707,0.869]。仅在纯合成数据上进行训练导致的 AUROC 为 0.743[0.658,0.801]。在真实和合成数据上进行训练进一步将 AUROC 提高到 0.777[0.715,0.821]。我们的结论是,合成组织学图像保留了反映结直肠癌潜在遗传改变的信息。使用合成图像而不是真实图像来训练深度学习系统可以产生非劣分类器。这种方法可用于创建可共享的大型数据集,或用罕见的分子特征来扩充小型数据集。2021 年,The Authors. The Journal of Pathology 由 John Wiley & Sons, Ltd. 代表 Great Britain and Ireland 出版。