Thomas Simon M, Lefevre James G, Baxter Glenn, Hamilton Nicholas A
Institute for Molecular Bioscience, The University of Queensland, St Lucia, Australia.
MyLab Pathology, Salisbury, Australia.
Data Brief. 2021 Nov 19;39:107587. doi: 10.1016/j.dib.2021.107587. eCollection 2021 Dec.
Densely labelled segmentation data for digital pathology images is costly to produce but is invaluable to training effective machine learning models. We make available 290 hand-annotated histopathology tissue sections of the 3 most common skin cancers; basal cell carcinoma (BCC), squamous cell carcinoma (SCC) and intraepidermal carcinoma (IEC). These non-melanoma skin cancers constitute over 90% of all skin cancer diagnoses and hence this dataset gives an opportunity to the scientific community to benchmark analytic methodologies on a significant portion of the dermatopathology workflow. The data represents typical cases of the three cancer types (not requiring a differential diagnosis) across shave, punch and excision biopsy contexts. Each image is accompanied with a segmentation mask which characterizes the section into 12 tissue types, specifically: keratin, epidermis, papillary dermis, reticular dermis, hypodermis, inflammation, glands, hair follicles and background, as well as BCC, SCC and IEC. Included also are cancer margin measurements to work towards automated assessment of surgical margin clearance and tumour invasion. This leaves open many opportunities for researchers to utilize or extend the dataset, building upon recent work on image analysis problems in skin cancer (Thomas et al., 2021).
用于数字病理学图像的密集标记分割数据生产成本高昂,但对于训练有效的机器学习模型却非常宝贵。我们提供了290个经过手工标注的组织病理学切片,涵盖3种最常见的皮肤癌:基底细胞癌(BCC)、鳞状细胞癌(SCC)和表皮内癌(IEC)。这些非黑色素瘤皮肤癌占所有皮肤癌诊断病例的90%以上,因此该数据集为科学界提供了一个机会,可在很大一部分皮肤病理学工作流程中对分析方法进行基准测试。数据代表了刮除活检、打孔活检和切除活检情况下这三种癌症类型的典型病例(无需鉴别诊断)。每张图像都配有一个分割掩码,该掩码将切片分为12种组织类型,具体为:角质、表皮、乳头层真皮、网状真皮、皮下组织、炎症、腺体、毛囊和背景,以及基底细胞癌、鳞状细胞癌和表皮内癌。还包括癌症边缘测量,以实现手术切缘清除和肿瘤浸润的自动评估。这为研究人员利用或扩展该数据集提供了许多机会,可基于近期关于皮肤癌图像分析问题的研究(托马斯等人,2021年)开展工作。