Petrie T C, Larson C, Heath M, Samatham R, Davis A, Berry E G, Leachman S A
Department of Dermatology Oregon Health & Science University Portland Oregon USA.
Skin Health Dis. 2021 Mar 19;1(2):e19. doi: 10.1002/ski2.19. eCollection 2021 Jun.
Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success. However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair and ink markings have been published, quantitative measurements of blur, colour and lighting variations on classification accuracy has not yet been reported to our knowledge.
We created a system that measures the impact of various artefacts on machine learning accuracy. Our objectives were to (1) quantitatively identify the most egregious artefacts and (2) demonstrate how to assess a classification algorithm's accuracy when input images include artefacts.
We injected artefacts into dermatologic images using techniques that could be controlled with a single variable. This allows us to quantitatively evaluate the impact on the accuracy. We trained two convolutional neural networks on two different binary classification tasks and measured the impact on dermoscopy images over a range of parameter values. The area under the curve and specificity-at-a-given-sensitivity values were measured for each artefact induced at each parameter.
General blur had the strongest negative effect on the versus task. Conversely, shifting the hue towards blue had a more pronounced effect on the versus task.
Classifiers should either mitigate artefacts or detect them. Images should be excluded from diagnosis/recommendation when artefacts are present in amounts outside the machine perceived quality range. Failure to do so will reduce accuracy and impede approval from regulatory agencies.
已经开发出许多分类器,它们能够不同程度地成功区分不同类型的皮肤病变(例如良性痣、黑色素瘤)。然而,即使是训练成功的分类器在包含伪影的图像上也可能表现不佳。虽然毛发和墨水标记所产生的问题已有报道,但据我们所知,关于模糊、颜色和光照变化对分类准确性的定量测量尚未见报道。
我们创建了一个系统,用于测量各种伪影对机器学习准确性的影响。我们的目标是:(1)定量识别最严重的伪影;(2)演示当输入图像包含伪影时如何评估分类算法的准确性。
我们使用可以由单个变量控制的技术将伪影注入皮肤病图像中。这使我们能够定量评估对准确性的影响。我们在两个不同的二元分类任务上训练了两个卷积神经网络,并在一系列参数值范围内测量对皮肤镜图像的影响。针对每个参数下诱导的每种伪影,测量曲线下面积和给定敏感度下的特异性值。
一般模糊对良性痣与黑色素瘤的区分任务负面影响最强。相反,将色调向蓝色偏移对脂溢性角化病与黑色素瘤的区分任务影响更为显著。
分类器应减轻伪影或对其进行检测。当伪影数量超出机器感知的质量范围时,图像应被排除在诊断/推荐之外。否则将会降低准确性,并阻碍监管机构的批准。