Katsch Florian, Rinner Christoph, Tschandl Philipp
Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
Department of Dermatology, Medical University of Vienna, Vienna, Austria.
Dermatol Pract Concept. 2022 Jul 1;12(3):e2022126. doi: 10.5826/dpc.1203a126. eCollection 2022 Jul.
Classification of dermatoscopic images via neural networks shows comparable performance to clinicians in experimental conditions but can be affected by artefacts like skin markings or rulers. It is unknown whether specialized neural networks are more robust to artefacts.
Analyze robustness of 3 neural network architectures, namely ResNet-34, Faster R-CNN and Mask R-CNN.
We identified common artefacts in the HAM10000, PH2 and the 7-point criteria evaluation datasets, and established a template-based method to superimpose artefacts on dermatoscopic images. The HAM10000-dataset with and without superimposed artefacts was used to train the networks, followed by analyzing their robustness against artefacts in test images. Performance was assessed via area under the precision recall curve and classification results.
ResNet-34 and Faster R-CNN models trained on regular images perform worse than Mask R-CNN on images with superimposed artefacts. Artefacts added to all tested images led to a decrease in area under the precision-recall curve values of 0.030 for ResNet-34 and 0.045 for Faster R-CNN in comparison to only 0.011 for Mask R-CNN. However, changes in model performance only became significant with 40% or more of the images having superimposed artefacts. A loss in performance occurred when the training was biased by selectively superimposing artefacts on images belonging to a certain class.
As Mask R-CNN showed the least decrease in performance when confronted with artefacts, instance segmentation architectures may be helpful to counter the effects of artefacts, warranting further research on related architectures. Our artefact insertion mechanism could be useful for future research.
通过神经网络对皮肤镜图像进行分类在实验条件下显示出与临床医生相当的性能,但可能会受到皮肤标记或尺子等伪像的影响。尚不清楚专门的神经网络对伪像是否更具鲁棒性。
分析三种神经网络架构,即ResNet-34、Faster R-CNN和Mask R-CNN的鲁棒性。
我们在HAM10000、PH2和7点标准评估数据集中识别出常见伪像,并建立了一种基于模板的方法将伪像叠加到皮肤镜图像上。使用带有和不带有叠加伪像的HAM10000数据集来训练网络,随后分析它们对测试图像中伪像的鲁棒性。通过精确召回率曲线下的面积和分类结果来评估性能。
在常规图像上训练的ResNet-34和Faster R-CNN模型在带有叠加伪像的图像上的表现比Mask R-CNN差。与Mask R-CNN仅下降0.011相比,添加到所有测试图像中的伪像导致ResNet-34的精确召回率曲线下面积值下降0.030,Faster R-CNN下降0.045。然而,只有当40%或更多图像带有叠加伪像时,模型性能的变化才变得显著。当训练因有选择地将伪像叠加到属于某一特定类别的图像上而产生偏差时,性能会下降。
由于Mask R-CNN在面对伪像时性能下降最少,实例分割架构可能有助于应对伪像的影响,值得对相关架构进行进一步研究。我们的伪像插入机制可能对未来的研究有用。