Department of Computing, Federal University of Technology - Parana, 1640, Alberto Carazzai Av., Cornelio Procopio, PR 86300-000, Brazil.
Department of Computing, Federal University of Technology - Parana, 1640, Alberto Carazzai Av., Cornelio Procopio, PR 86300-000, Brazil; Departament of Computing, Federal University of Sao Carlos, km 235, Rodovia Washington Luis, Sao Carlos, SP 13565-905, Brazil; Institute of Computing, State University of Campinas, 1251, Albert Einstein Ave, Cidade Universitária, Campinas, SP 13083-852, Brazil.
Comput Methods Programs Biomed. 2022 Nov;226:107122. doi: 10.1016/j.cmpb.2022.107122. Epub 2022 Sep 11.
According to the National Cancer Institute, among all malignant tumors, non-melanoma skin cancer, and melanoma are the most frequent in Brazil. Despite having a lower incidence, the melanoma type has accelerated growth and greater lethality. Several studies have been performed in recent years in the computer vision area to assist in the early diagnosis of skin cancer. Despite being widely used and presenting good results, deep learning approaches require a large amount of annotated data and considerable computational cost for training the model. Therefore, the present work explores active learning approaches to select a small set of more informative data for training the classifier. For that, different selection criteria are considered to obtain more effective and efficient classifiers for skin lesions.
We perform an extensive experimental evaluation considering three datasets and different learning strategies and scenarios for validation. In addition to data augmentation, we evaluated two segmentation strategies considering the U-net CNN model and the Fully Convolutional Networks (FCN) with a manual expert review. We also analyzed the best (handcrafted and deep) features that describe each skin lesion and the most suitable classifiers and combinations (extractor-classifier) for this context. The active learning approach evaluated different criteria based on uncertainty, diversity, and representativeness to select the most informative samples. The strategies used were Decreasing Boundary Edges, Entropy, Least Confidence, Margin Sampling, Minimum-Spanning Tree Boundary Edges, and Root-Distance based Sampling.
It can be observed that the segmentation with FCN and manual correction by the specialist, the Border-Interior Classification (BIC) extractor, and the Random Forest (RF) classifier showed a better performance. Regarding the active learning approach, the Margin Sampling strategy presented the best classification accuracies (about 93%) with only 35% of the training set compared to the traditional learning approach (which requires the entire set).
According to the results, it is possible to observe that the selection strategies allow for achieving high accuracies faster (fewer learning iterations) and with a smaller amount of labeled samples compared to the traditional learning approach. Hence, active learning can contribute significantly to the diagnosis of skin lesions, beneficially reducing specialists' annotation costs.
据美国国家癌症研究所称,在所有恶性肿瘤中,非黑色素瘤皮肤癌和黑色素瘤在巴西最为常见。尽管黑色素瘤的发病率较低,但它的生长速度更快,致死率更高。近年来,计算机视觉领域开展了多项研究,以辅助皮肤癌的早期诊断。尽管深度学习方法应用广泛且效果良好,但它们在训练模型时需要大量标注数据和大量计算成本。因此,本研究探索了主动学习方法,以选择一小部分更具信息量的数据来训练分类器。为此,考虑了不同的选择标准,以获得针对皮肤病变更有效和高效的分类器。
我们进行了广泛的实验评估,考虑了三个数据集和不同的学习策略以及验证场景。除了数据增强之外,我们还评估了两种分割策略,分别考虑了 U-net CNN 模型和全卷积网络(FCN),并结合了手动专家审查。我们还分析了描述每个皮肤病变的最佳(手工和深度学习)特征,以及针对这种情况最适合的分类器和组合(提取器-分类器)。主动学习方法评估了基于不确定性、多样性和代表性的不同标准,以选择最具信息量的样本。所使用的策略包括边界边缘递减、熵、最小置信度、边界边缘最小跨度树、基于根距离的采样。
可以观察到,FCN 分割与专家手动校正、边界-内部分类(BIC)提取器和随机森林(RF)分类器相结合,表现出更好的性能。关于主动学习方法,边际抽样策略在仅使用 35%的训练集的情况下,与传统学习方法(需要整个数据集)相比,实现了更高的分类准确率(约 93%)。
根据结果可以观察到,与传统学习方法相比,选择策略可以更快地实现高精度(更少的学习迭代),并且使用的标注样本更少。因此,主动学习可以为皮肤病变的诊断做出重要贡献,大大降低专家的标注成本。