Instituto Universitario de Investigación en Tecnología Centrada en el Ser Humano, Universitat Politècnica de València, Spain.
Department of Computer Science and Artificial Intelligence, Universidad de Granada, Granada, Spain.
Comput Methods Programs Biomed. 2024 Dec;257:108472. doi: 10.1016/j.cmpb.2024.108472. Epub 2024 Oct 28.
Currently, prostate cancer (PCa) diagnosis relies on the human analysis of prostate biopsy Whole Slide Images (WSIs) using the Gleason score. Since this process is error-prone and time-consuming, recent advances in machine learning have promoted the use of automated systems to assist pathologists. Unfortunately, labeled datasets for training and validation are scarce due to the need for expert pathologists to provide ground-truth labels.
This work introduces a new prostate histopathological dataset named CrowdGleason, which consists of 19,077 patches from 1045 WSIs with various Gleason grades. The dataset was annotated using a crowdsourcing protocol involving seven pathologists-in-training to distribute the labeling effort. To provide a baseline analysis, two crowdsourcing methods based on Gaussian Processes (GPs) were evaluated for Gleason grade prediction: SVGPCR, which learns a model from the CrowdGleason dataset, and SVGPMIX, which combines data from the public dataset SICAPv2 and the CrowdGleason dataset. The performance of these methods was compared with other crowdsourcing and expert label-based methods through comprehensive experiments.
The results demonstrate that our GP-based crowdsourcing approach outperforms other methods for aggregating crowdsourced labels (κ=0.7048±0.0207) for SVGPCR vs.(κ=0.6576±0.0086) for SVGP with majority voting). SVGPCR trained with crowdsourced labels performs better than GP trained with expert labels from SICAPv2 (κ=0.6583±0.0220) and outperforms most individual pathologists-in-training (mean κ=0.5432). Additionally, SVGPMIX trained with a combination of SICAPv2 and CrowdGleason achieves the highest performance on both datasets (κ=0.7814±0.0083 and κ=0.7276±0.0260).
The experiments show that the CrowdGleason dataset can be successfully used for training and validating supervised and crowdsourcing methods. Furthermore, the crowdsourcing methods trained on this dataset obtain competitive results against those using expert labels. Interestingly, the combination of expert and non-expert labels opens the door to a future of massive labeling by incorporating both expert and non-expert pathologist annotators.
目前,前列腺癌(PCa)的诊断依赖于人类对前列腺活检全切片图像(WSI)的分析,使用 Gleason 评分。由于这个过程容易出错且耗时,机器学习的最新进展推动了自动化系统的使用,以协助病理学家。然而,由于需要专家病理学家提供真实标签,因此用于训练和验证的标记数据集非常稀缺。
本工作引入了一个名为 CrowdGleason 的新前列腺组织病理学数据集,它由来自 1045 张 WSI 的 19077 个斑块组成,具有各种 Gleason 分级。该数据集使用众包协议进行注释,涉及 7 名受训病理学家,以分配标记工作。为了进行基线分析,评估了两种基于高斯过程(GP)的众包方法进行 Gleason 分级预测:SVGPCR,它从 CrowdGleason 数据集学习模型,以及 SVGPMIX,它结合了来自公共数据集 SICAPv2 和 CrowdGleason 数据集的数据。通过综合实验比较了这些方法与其他众包和基于专家标签的方法的性能。
结果表明,我们的基于 GP 的众包方法在聚集众包标签方面优于其他方法(κ=0.7048±0.0207 对 SVGPCR 与(κ=0.6576±0.0086 对 SVGP 与多数投票)。用众包标签训练的 SVGPCR 比用 SICAPv2 的专家标签训练的 GP(κ=0.6583±0.0220)表现更好,并且优于大多数受训病理学家(平均κ=0.5432)。此外,用 SICAPv2 和 CrowdGleason 组合训练的 SVGPMIX 在两个数据集上都取得了最高性能(κ=0.7814±0.0083 和 κ=0.7276±0.0260)。
实验表明,CrowdGleason 数据集可成功用于训练和验证监督和众包方法。此外,用该数据集训练的众包方法获得的结果与使用专家标签的方法相当。有趣的是,专家和非专家标签的结合为大规模标记开辟了道路,即将专家和非专家病理学家注释器都纳入其中。