Instituto Universitario de Investigación en Tecnología Centrada en el Ser Humano, Universitat Politècnica de València, Valencia, Spain.
Department of Computer Science and Artificial Intelligence, University of Granada, 18010 Granada, Spain.
Artif Intell Med. 2023 Nov;145:102686. doi: 10.1016/j.artmed.2023.102686. Epub 2023 Oct 17.
Digital Pathology (DP) has experienced a significant growth in recent years and has become an essential tool for diagnosing and prognosis of tumors. The availability of Whole Slide Images (WSIs) and the implementation of Deep Learning (DL) algorithms have paved the way for the appearance of Artificial Intelligence (AI) systems that support the diagnosis process. These systems require extensive and varied data for their training to be successful. However, creating labeled datasets in histopathology is laborious and time-consuming. We have developed a crowdsourcing-multiple instance labeling/learning protocol that is applied to the creation and use of the CR-AI4SkIN dataset. CR-AI4SkIN contains 271 WSIs of 7 Cutaneous Spindle Cell (CSC) neoplasms with expert and non-expert labels at region and WSI levels. It is the first dataset of these types of neoplasms made available. The regions selected by the experts are used to learn an automatic extractor of Regions of Interest (ROIs) from WSIs. To produce the embedding of each WSI, the representations of patches within the ROIs are obtained using a contrastive learning method, and then combined. Finally, they are fed to a Gaussian process-based crowdsourcing classifier, which utilizes the noisy non-expert WSI labels. We validate our crowdsourcing-multiple instance learning method in the CR-AI4SkIN dataset, addressing a binary classification problem (malign vs. benign). The proposed method obtains an F1 score of 0.7911 on the test set, outperforming three widely used aggregation methods for crowdsourcing tasks. Furthermore, our crowdsourcing method also outperforms the supervised model with expert labels on the test set (F1-score = 0.6035). The promising results support the proposed crowdsourcing multiple instance learning annotation protocol. It also validates the automatic extraction of interest regions and the use of contrastive embedding and Gaussian process classification to perform crowdsourcing classification tasks.
数字病理学(DP)近年来发展迅速,已成为肿瘤诊断和预后的重要工具。全切片图像(WSI)的可用性和深度学习(DL)算法的实施为支持诊断过程的人工智能(AI)系统的出现铺平了道路。这些系统需要广泛而多样的数据进行培训才能成功。然而,在组织病理学中创建标记数据集是费力且耗时的。我们开发了一种众包-多实例标记/学习协议,该协议应用于创建和使用 CR-AI4SkIN 数据集。CR-AI4SkIN 包含 7 个皮肤梭形细胞(CSC)肿瘤的 271 个 WSI,具有区域和 WSI 级别的专家和非专家标签。这是首次提供此类肿瘤的数据集。专家选择的区域用于学习从 WSI 自动提取感兴趣区域(ROI)的自动提取器。为了生成每个 WSI 的嵌入,使用对比学习方法从 ROI 内的斑块获得表示,然后进行组合。最后,它们被馈送到基于高斯过程的众包分类器中,该分类器利用嘈杂的非专家 WSI 标签。我们在 CR-AI4SkIN 数据集上验证了我们的众包多实例学习方法,解决了二元分类问题(恶性与良性)。所提出的方法在测试集上获得了 0.7911 的 F1 分数,优于三种广泛用于众包任务的聚合方法。此外,我们的众包方法在测试集上也优于具有专家标签的监督模型(F1 分数= 0.6035)。有希望的结果支持所提出的众包多实例学习注释协议。它还验证了自动提取感兴趣区域以及使用对比嵌入和高斯过程分类来执行众包分类任务的有效性。