Dept. de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain.
Hospital Clínico Universitario de Valladolid, Valladolid, Spain.
Sci Data. 2024 Oct 5;11(1):1088. doi: 10.1038/s41597-024-03944-3.
Accurate detection and classification of lung malignancies are crucial for early diagnosis, treatment planning, and patient prognosis. Conventional histopathological analysis is time-consuming, limiting its clinical applicability. To address this, we present a dataset of 691 high-resolution (1200 × 1600 pixels) histopathological lung images, covering adenocarcinomas, squamous cell carcinomas, and normal tissues from 45 patients. These images are subdivided into three differentiation levels for both pathological types: well, moderately, and poorly differentiated, resulting in seven classes for classification. The dataset includes images at 20x and 40x magnification, reflecting real clinical diversity. We evaluated image classification using deep neural network and multiple instance learning approaches. Each method was used to classify images at 20x and 40x magnification into three superclasses. We achieved accuracies between 81% and 92%, depending on the method and resolution, demonstrating the dataset's utility.
准确检测和分类肺部恶性肿瘤对于早期诊断、治疗计划和患者预后至关重要。传统的组织病理学分析耗时耗力,限制了其临床应用。针对这一问题,我们提供了一个包含 691 张高分辨率(1200×1600 像素)肺部组织病理学图像的数据集,涵盖了 45 名患者的腺癌、鳞状细胞癌和正常组织。这些图像按照两种病理类型的分化程度分为三个级别:高分化、中分化和低分化,因此分类共包含七个类别。该数据集包含 20x 和 40x 放大倍数的图像,反映了真实的临床多样性。我们使用深度神经网络和多实例学习方法评估了图像分类。每种方法都用于将 20x 和 40x 放大倍数的图像分为三个超类。根据方法和分辨率的不同,我们的准确率在 81%到 92%之间,这表明了该数据集的实用性。