Suppr超能文献

用于病变分割的大规模皮肤病理学数据集:模型开发与分析

Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis.

作者信息

Chong Yosep, Park Daseul, Ahn Youngbin, Kwak Yoonjin, Park Seyeon, Back Seung Wan, Lee Changwoo, Park Gyeongsin, Alam Mohammad Rizwan, Kim Binna, Jang Kee-Taek, Han Nayoung, Yoo Chong Woo, Lee Jonghyuck, Lee Cheol, Kim Young-Gon

机构信息

Department of Hospital Pathology, College of Medicine, The Catholic University of Korea, Seoul, Korea.

Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea.

出版信息

J Korean Med Sci. 2025 Sep 8;40(35):e220. doi: 10.3346/jkms.2025.40.e220.

Abstract

BACKGROUND

With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses. However, large-scale skin pathology image datasets for AI learning are often insufficient or limited to specific diseases. This study aimed to build and assess a large-scale dermatopathology image dataset for an AI model.

METHODS

We trained and evaluated a lesion segmentation model based on this dataset, which consisted of over 34,376 histopathology slide images collected from four institutions, including normal skin and six types of common skin lesion: epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, and malignant melanoma. Each image was accompanied by labeled data consisting of lesion area annotations and clinical information. To ensure the high quality and accuracy of the dataset, we employed data quality management methods, including syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation.

RESULTS

The results of the dataset quality assessment confirmed high quality, with syntactic accuracy and semantic accuracy at 0.99 and 0.95, respectively. Statistical diversity was verified to follow a natural distribution. The validity evaluation verified the strong performance of the segmentation model for each group of data, with a Dice score ranging from 80% to 91%.

CONCLUSION

The results demonstrated that our constructed dataset provides a well-suited resource for deep learning training, offering a large-scale multi-institutional dermatopathology dataset that can drive advancements in AI-driven dermatopathology diagnosis.

摘要

背景

随着皮肤癌发病率的上升,病理学家的工作量激增。与其他器官样本相比,皮肤样本的诊断,尤其是对于恶性黑色素瘤和黑素细胞性病变等复杂病变的诊断,显示出更高的诊断变异性。因此,越来越需要基于人工智能(AI)的诊断辅助程序来支持皮肤病理学家实现更一致的诊断。然而,用于AI学习的大规模皮肤病理图像数据集往往不足或仅限于特定疾病。本研究旨在构建和评估用于AI模型的大规模皮肤病理图像数据集。

方法

我们基于该数据集训练并评估了一个病变分割模型,该数据集由从四个机构收集的超过34376张组织病理学幻灯片图像组成,包括正常皮肤和六种常见皮肤病变:表皮囊肿、脂溢性角化病、鲍恩病/鳞状细胞癌、基底细胞癌、黑素细胞痣和恶性黑色素瘤。每张图像都附有由病变区域注释和临床信息组成的标记数据。为确保数据集的高质量和准确性,我们采用了数据质量管理方法,包括句法准确性、语义准确性、统计多样性和有效性评估。

结果

数据集质量评估结果证实了其高质量,句法准确性和语义准确性分别为0.99和0.95。统计多样性经核实遵循自然分布。有效性评估证实了分割模型对每组数据的强大性能,骰子系数在80%至91%之间。

结论

结果表明,我们构建的数据集为深度学习训练提供了一个非常合适的资源,提供了一个大规模的多机构皮肤病理数据集,可以推动AI驱动的皮肤病理诊断的进步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e63/12418205/615d4a356450/jkms-40-e220-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验