Lutnick Brendon, Manthey David, Becker Jan U, Zuckerman Jonathan E, Rodrigues Luis, Jen Kuang-Yu, Sarder Pinaki
Department of Pathology and Anatomical Sciences, SUNY Buffalo, Buffalo, NY, USA.
Kitware Incorporated, Clifton Park, NY, USA.
J Pathol Inform. 2022 May 21;13:100101. doi: 10.1016/j.jpi.2022.100101. eCollection 2022.
The largest bottleneck to the development of convolutional neural network (CNN) models in the computational pathology domain is the collection and curation of diverse training datasets. Training CNNs requires large cohorts of image data, and model generalizability is dependent on training data heterogeneity. Including data from multiple centers enhances the generalizability of CNN-based models, but this is hindered by the logistical challenges of sharing medical data. In this paper, we explore the feasibility of training our recently developed cloud-based segmentation tool (Histo-Cloud) using federated learning. Using a dataset of renal tissue biopsies we show that federated training to segment interstitial fibrosis and tubular atrophy (IFTA) using datasets from three institutions is not found to be different from a training by pooling the data on one server when tested on a fourth (holdout) institution's data. Further, training a model to segment glomeruli for a federated dataset (split by staining) demonstrates similar performance.
在计算病理学领域,卷积神经网络(CNN)模型发展的最大瓶颈是多样训练数据集的收集与管理。训练CNN需要大量的图像数据群组,并且模型的通用性取决于训练数据的异质性。纳入来自多个中心的数据可提高基于CNN模型的通用性,但这受到医学数据共享后勤挑战的阻碍。在本文中,我们探讨了使用联邦学习训练我们最近开发的基于云的分割工具(Histo-Cloud)的可行性。使用肾组织活检数据集,我们表明,在第四个(验证)机构的数据上进行测试时,使用来自三个机构的数据集通过联邦训练来分割间质纤维化和肾小管萎缩(IFTA),与在一台服务器上汇总数据进行训练没有差异。此外,针对一个联邦数据集(按染色分割)训练一个用于分割肾小球的模型,也显示出类似的性能。