IEEE Trans Med Imaging. 2023 Jul;42(7):1969-1981. doi: 10.1109/TMI.2022.3221724. Epub 2023 Jun 30.
Currently, data-driven based machine learning is considered one of the best choices in clinical pathology analysis, and its success is subject to the sufficiency of digitized slides, particularly those with deep annotations. Although centralized training on a large data set may be more reliable and more generalized, the slides to the examination are more often than not collected from many distributed medical institutes. This brings its own challenges, and the most important is the assurance of privacy and security of incoming data samples. In the discipline of histopathology image, the universal stain-variation issue adds to the difficulty of an automatic system as different clinical institutions provide distinct stain styles. To address these two important challenges in AI-based histopathology diagnoses, this work proposes a novel conditional Generative Adversarial Network (GAN) with one orchestration generator and multiple distributed discriminators, to cope with multiple-client based stain-style normalization. Implemented within a Federated Learning (FL) paradigm, this framework well preserves data privacy and security. Additionally, the training consistency and stability of the distributed system are further enhanced by a novel temporal self-distillation regularization scheme. Empirically, on large cohorts of histopathology datasets as a benchmark, the proposed model matches the performance of conventional centralized learning very closely. It also outperforms state-of-the-art stain-style transfer methods on the downstream Federated Learning image classification task, with an accuracy increase of over 20.0% in comparison to the baseline classification model.
目前,基于数据驱动的机器学习被认为是临床病理学分析的最佳选择之一,其成功取决于数字化幻灯片的充足性,特别是那些具有深度注释的幻灯片。虽然在大型数据集上进行集中式训练可能更可靠、更具泛化性,但用于检查的幻灯片通常是从许多分布式医疗机构收集而来的。这带来了自身的挑战,其中最重要的是确保传入数据样本的隐私和安全。在组织病理学图像领域,普遍的染色变化问题增加了自动系统的难度,因为不同的临床机构提供不同的染色风格。为了解决人工智能组织病理学诊断中的这两个重要挑战,本研究提出了一种新颖的条件生成对抗网络(GAN),该网络具有一个协调生成器和多个分布式鉴别器,以应对基于多客户端的染色风格归一化问题。在联邦学习(FL)范式下实现,该框架很好地保护了数据的隐私和安全。此外,通过一种新颖的时间自蒸馏正则化方案,进一步增强了分布式系统的训练一致性和稳定性。在组织病理学数据集的大队列基准上进行的实证研究表明,所提出的模型与传统集中式学习的性能非常接近。它还在下游联邦学习图像分类任务中优于最新的染色风格迁移方法,与基线分类模型相比,准确性提高了 20.0%以上。