SynthEye:研究合成数据对遗传性视网膜疾病人工智能辅助基因诊断的影响。
SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease.
作者信息
Veturi Yoga Advaith, Woof William, Lazebnik Teddy, Moghul Ismail, Woodward-Court Peter, Wagner Siegfried K, Cabral de Guimarães Thales Antonio, Daich Varela Malena, Liefers Bart, Patel Praveen J, Beck Stephan, Webster Andrew R, Mahroo Omar, Keane Pearse A, Michaelides Michel, Balaskas Konstantinos, Pontikos Nikolas
机构信息
University College London Institute of Ophthalmology, University College London, London, UK.
Moorfields Eye Hospital, London, UK.
出版信息
Ophthalmol Sci. 2022 Nov 22;3(2):100258. doi: 10.1016/j.xops.2022.100258. eCollection 2023 Jun.
PURPOSE
Rare disease diagnosis is challenging in medical image-based artificial intelligence due to a natural class imbalance in datasets, leading to biased prediction models. Inherited retinal diseases (IRDs) are a research domain that particularly faces this issue. This study investigates the applicability of synthetic data in improving artificial intelligence-enabled diagnosis of IRDs using generative adversarial networks (GANs).
DESIGN
Diagnostic study of gene-labeled fundus autofluorescence (FAF) IRD images using deep learning.
PARTICIPANTS
Moorfields Eye Hospital (MEH) dataset of 15 692 FAF images obtained from 1800 patients with confirmed genetic diagnosis of 1 of 36 IRD genes.
METHODS
A StyleGAN2 model is trained on the IRD dataset to generate 512 × 512 resolution images. Convolutional neural networks are trained for classification using different synthetically augmented datasets, including real IRD images plus 1800 and 3600 synthetic images, and a fully rebalanced dataset. We also perform an experiment with only synthetic data. All models are compared against a baseline convolutional neural network trained only on real data.
MAIN OUTCOME MEASURES
We evaluated synthetic data quality using a Visual Turing Test conducted with 4 ophthalmologists from MEH. Synthetic and real images were compared using feature space visualization, similarity analysis to detect memorized images, and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score for no-reference-based quality evaluation. Convolutional neural network diagnostic performance was determined on a held-out test set using the area under the receiver operating characteristic curve (AUROC) and Cohen's Kappa (κ).
RESULTS
An average true recognition rate of 63% and fake recognition rate of 47% was obtained from the Visual Turing Test. Thus, a considerable proportion of the synthetic images were classified as real by clinical experts. Similarity analysis showed that the synthetic images were not copies of the real images, indicating that copied real images, meaning the GAN was able to generalize. However, BRISQUE score analysis indicated that synthetic images were of significantly lower quality overall than real images ( < 0.05). Comparing the rebalanced model (RB) with the baseline (R), no significant change in the average AUROC and κ was found (R-AUROC = 0.86[0.85-88], RB-AUROC = 0.88[0.86-0.89], R-k = 0.51[0.49-0.53], and RB-k = 0.52[0.50-0.54]). The synthetic data trained model (S) achieved similar performance as the baseline (S-AUROC = 0.86[0.85-87], S-k = 0.48[0.46-0.50]).
CONCLUSIONS
Synthetic generation of realistic IRD FAF images is feasible. Synthetic data augmentation does not deliver improvements in classification performance. However, synthetic data alone deliver a similar performance as real data, and hence may be useful as a proxy to real data. Proprietary or commercial disclosure may be found after the references.
目的
在基于医学图像的人工智能中,由于数据集存在自然的类别不平衡,罕见病诊断颇具挑战,这会导致预测模型出现偏差。遗传性视网膜疾病(IRD)是一个特别面临此问题的研究领域。本研究探讨了合成数据在使用生成对抗网络(GAN)改善基于人工智能的IRD诊断中的适用性。
设计
使用深度学习对基因标记的眼底自发荧光(FAF)IRD图像进行诊断研究。
参与者
摩尔菲尔德眼科医院(MEH)数据集,包含从1800例经确诊患有36种IRD基因中某一种基因疾病的患者获得的15692张FAF图像。
方法
在IRD数据集上训练一个StyleGAN2模型,以生成分辨率为512×512的图像。使用不同的合成增强数据集训练卷积神经网络进行分类,这些数据集包括真实IRD图像加上1800张和3600张合成图像,以及一个完全重新平衡的数据集。我们还仅使用合成数据进行了一项实验。将所有模型与仅在真实数据上训练的基线卷积神经网络进行比较。
主要观察指标
我们通过与MEH的4位眼科医生进行的视觉图灵测试来评估合成数据质量。使用特征空间可视化、检测记忆图像的相似性分析以及用于基于无参考的质量评估的盲/无参考图像空间质量评估器(BRISQUE)分数来比较合成图像和真实图像。使用受试者操作特征曲线下面积(AUROC)和科恩卡方(κ)在一个保留测试集上确定卷积神经网络的诊断性能。
结果
视觉图灵测试的平均真识别率为63%,假识别率为47%。因此,相当一部分合成图像被临床专家分类为真实图像。相似性分析表明,合成图像不是真实图像的副本,这表明GAN能够进行泛化,即合成图像不是复制的真实图像。然而,BRISQUE分数分析表明,合成图像的整体质量明显低于真实图像(P<0.05)。将重新平衡模型(RB)与基线模型(R)进行比较,发现平均AUROC和κ没有显著变化(R - AUROC = 0.86[0.85 - 0.88],RB - AUROC = 0.88[0.86 - 0.89],R - κ = 0.51[0.49 - 0.53],RB - κ = 0.52[0.50 - 0.54])。合成数据训练模型(S)的性能与基线模型相似(S - AUROC = 0.86[0.85 - 0.87],S - κ = 0.48[0.46 - 0.50])。
结论
合成生成逼真的IRD FAF图像是可行的。合成数据增强并未提高分类性能。然而,仅合成数据就能提供与真实数据相似的性能,因此可作为真实数据的替代。参考文献之后可能会有专利或商业披露。