Su Ziyu, Chen Wei, Leigh Preston J, Sajjad Usama, Niu Shuo, Rezapour Mostafa, Frankel Wendy L, Gurcan Metin N, Niazi M Khalid Khan
Center for Artificial Intelligence Research, Wake Forest University, School of Medicine, Winston-Salem, NC, USA.
Department of Pathology, The Ohio State University.
Proc SPIE Int Soc Opt Eng. 2024 Feb;12933. doi: 10.1117/12.3006418. Epub 2024 Apr 3.
Current deep learning methods in histopathology are limited by the small amount of available data and time consumption in labeling the data. Colorectal cancer (CRC) tumor budding quantification performed using H&E-stained slides is crucial for cancer staging and prognosis but is subject to labor-intensive annotation and human bias. Thus, acquiring a large-scale, fully annotated dataset for training a tumor budding (TB) segmentation/detection system is difficult. Here, we present a DatasetGAN-based approach that can generate essentially an unlimited number of images with TB masks from a moderate number of unlabeled images and a few annotated images. The images generated by our model closely resemble the real colon tissue on H&E-stained slides. We test the performance of this model by training a downstream segmentation model, UNet++, on the generated images and masks. Our results show that the trained UNet++ model can achieve reasonable TB segmentation performance, especially at the instance level. This study demonstrates the potential of developing an annotation-efficient segmentation model for automatic TB detection and quantification.
当前组织病理学中的深度学习方法受到可用数据量少以及数据标注耗时的限制。使用苏木精-伊红(H&E)染色切片进行的结直肠癌(CRC)肿瘤芽生定量对于癌症分期和预后至关重要,但需要大量人力进行标注且存在人为偏差。因此,获取用于训练肿瘤芽生(TB)分割/检测系统的大规模、完全标注数据集很困难。在此,我们提出一种基于数据集生成对抗网络(DatasetGAN)的方法,该方法可以从适量的未标注图像和少量标注图像中生成本质上数量无限的带有TB掩码的图像。我们模型生成的图像与H&E染色切片上的真实结肠组织非常相似。我们通过在生成的图像和掩码上训练下游分割模型UNet++来测试该模型的性能。我们的结果表明,训练后的UNet++模型可以实现合理的TB分割性能,尤其是在实例级别。本研究证明了开发一种用于自动TB检测和定量的标注高效分割模型的潜力。