Dee William, Alaaeldin Ibrahim Rana, Marouli Eirini
Digital Environment Research Institute (DERI), Queen Mary University of London, London, United Kingdom.
Centre for Oral Immunobiology and Regenerative Medicine, Institute of Dentistry, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom.
PLoS One. 2024 Dec 26;19(12):e0310417. doi: 10.1371/journal.pone.0310417. eCollection 2024.
Deep learning techniques are increasingly being used to classify medical imaging data with high accuracy. Despite this, due to often limited training data, these models can lack sufficient generalizability to predict unseen test data, produced in different domains, with comparable performance. This study focuses on thyroid histopathology image classification and investigates whether a Generative Adversarial Network [GAN], trained with just 156 patient samples, can produce high quality synthetic images to sufficiently augment training data and improve overall model generalizability. Utilizing a StyleGAN2 approach, the generative network produced images with an Fréchet Inception Distance (FID) score of 5.05, matching state-of-the-art GAN results in non-medical domains with comparable dataset sizes. Augmenting the training data with these GAN-generated images increased model generalizability when tested on external data sourced from three separate domains, improving overall precision and AUC by 7.45% and 7.20% respectively compared with a baseline model. Most importantly, this performance improvement was observed on minority class images, tumour subtypes which are known to suffer from high levels of inter-observer variability when classified by trained pathologists.
深度学习技术正越来越多地用于高精度地对医学成像数据进行分类。尽管如此,由于训练数据通常有限,这些模型可能缺乏足够的泛化能力,无法以可比的性能预测在不同领域生成的未见测试数据。本研究聚焦于甲状腺组织病理学图像分类,并研究仅使用156个患者样本进行训练的生成对抗网络(GAN)是否能够生成高质量的合成图像,以充分扩充训练数据并提高整体模型的泛化能力。利用StyleGAN2方法,生成网络生成的图像的弗雷歇因袭距离(FID)得分为5.05,与具有可比数据集大小的非医学领域的最先进GAN结果相当。用这些GAN生成的图像扩充训练数据后,在来自三个不同领域的外部数据上进行测试时,模型的泛化能力有所提高,与基线模型相比,整体精度和AUC分别提高了7.45%和7.20%。最重要的是,在少数类图像上观察到了这种性能提升,这些肿瘤亚型在由训练有素的病理学家进行分类时,已知存在较高的观察者间变异性。