Folmsbee Jonathan, Zhang Lei, Lu Xulei, Rahman Jawaria, Gentry John, Conn Brendan, Vered Marilena, Roy Paromita, Gupta Ruta, Lin Diana, Samankan Shabnam, Dhorajiva Pooja, Peter Anu, Wang Minhua, Israel Anna, Brandwein-Weber Margaret, Doyle Scott
Department of Pathology & Anatomical Sciences, University at Buffalo SUNY, Buffalo, NY, USA.
Department of Biomedical Engineering, University at Buffalo SUNY, Buffalo, NY, USA.
J Pathol Inform. 2022 Sep 27;13:100146. doi: 10.1016/j.jpi.2022.100146. eCollection 2022.
In digital pathology, deep learning has been shown to have a wide range of applications, from cancer grading to segmenting structures like glomeruli. One of the main hurdles for digital pathology to be truly effective is the size of the dataset needed for generalization to address the spectrum of possible morphologies. Small datasets limit classifiers' ability to generalize. Yet, when we move to larger datasets of whole slide images (WSIs) of tissue, these datasets may cause network bottlenecks as each WSI at its original magnification can be upwards of 100 000 by 100 000 pixels, and over a gigabyte in file size. Compounding this problem, high quality pathologist annotations are difficult to obtain, as the volume of necessary annotations to create a classifier that can generalize would be extremely costly in terms of pathologist-hours. In this work, we use Active Learning (AL), a process for iterative interactive training, to create a modified U-net classifier on the region of interest (ROI) scale. We then compare this to Random Learning (RL), where images for addition to the dataset for retraining are randomly selected. Our hypothesis is that AL shows benefits for generating segmentation results versus randomly selecting images to annotate. We show that after 3 iterations, that AL, with an average Dice coefficient of 0.461, outperforms RL, with an average Dice Coefficient of 0.375, by 0.086.
在数字病理学中,深度学习已被证明有广泛应用,从癌症分级到分割肾小球等结构。数字病理学要真正发挥效用的主要障碍之一是为实现泛化以处理各种可能形态所需的数据集规模。小数据集限制了分类器的泛化能力。然而,当我们转向更大的组织全切片图像(WSI)数据集时,这些数据集可能会导致网络瓶颈,因为每个原始放大倍数下的WSI可能有100000×100000像素以上,文件大小超过1GB。使问题更加复杂的是,高质量的病理学家注释很难获得,因为创建一个能够泛化的分类器所需的注释量在病理学家工时方面成本极高。在这项工作中,我们使用主动学习(AL),一种迭代交互式训练过程,在感兴趣区域(ROI)尺度上创建一个改进的U-net分类器。然后我们将其与随机学习(RL)进行比较,在随机学习中,随机选择要添加到用于重新训练的数据集的图像。我们的假设是,与随机选择图像进行注释相比,主动学习在生成分割结果方面具有优势。我们表明,经过3次迭代后,主动学习的平均骰子系数为0.461,优于随机学习,随机学习的平均骰子系数为0.375,领先0.086。