ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna 1090, Austria.
Faculty of Medicine, University of Queensland, Herston 4006, Austria.
Sci Data. 2018 Aug 14;5:180161. doi: 10.1038/sdata.2018.161.
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.
训练神经网络以实现色素性皮肤病变的自动诊断受到可用皮肤镜图像数据集规模小且缺乏多样性的限制。我们通过发布 HAM10000(“用 10000 张训练图像进行人机对抗”)数据集来解决这个问题。我们从不同人群中收集皮肤镜图像,这些图像是通过不同的模式获取和存储的。鉴于这种多样性,我们必须应用不同的获取和清理方法,并利用专门训练的神经网络开发半自动工作流程。最终数据集包含 10015 张皮肤镜图像,这些图像被作为学术机器学习的训练集发布,并通过 ISIC 档案公开提供。这个基准数据集可用于机器学习以及与人类专家的比较。病例包括色素性病变领域所有重要诊断类别的代表性集合。超过 50%的病变已通过病理证实,而其余病例的真实情况要么是随访、专家共识,要么是通过体内共聚焦显微镜证实。