Faculty of Computer Science (FACOM) - Federal University of Uberlândia (UFU), Av. João Naves de Ávila 2121, BLB, 38400-902, Uberlândia, MG, Brazil.
Federal Institute of Triângulo Mineiro (IFTM), R. Belarmino Vilela Junqueira, S/N, 38305-200, Ituiutaba, MG, Brazil.
J Imaging Inform Med. 2024 Aug;37(4):1691-1710. doi: 10.1007/s10278-024-01041-w. Epub 2024 Feb 26.
Early diagnosis of potentially malignant disorders, such as oral epithelial dysplasia, is the most reliable way to prevent oral cancer. Computational algorithms have been used as an auxiliary tool to aid specialists in this process. Usually, experiments are performed on private data, making it difficult to reproduce the results. There are several public datasets of histological images, but studies focused on oral dysplasia images use inaccessible datasets. This prevents the improvement of algorithms aimed at this lesion. This study introduces an annotated public dataset of oral epithelial dysplasia tissue images. The dataset includes 456 images acquired from 30 mouse tongues. The images were categorized among the lesion grades, with nuclear structures manually marked by a trained specialist and validated by a pathologist. Also, experiments were carried out in order to illustrate the potential of the proposed dataset in classification and segmentation processes commonly explored in the literature. Convolutional neural network (CNN) models for semantic and instance segmentation were employed on the images, which were pre-processed with stain normalization methods. Then, the segmented and non-segmented images were classified with CNN architectures and machine learning algorithms. The data obtained through these processes is available in the dataset. The segmentation stage showed the F1-score value of 0.83, obtained with the U-Net model using the ResNet-50 as a backbone. At the classification stage, the most expressive result was achieved with the Random Forest method, with an accuracy value of 94.22%. The results show that the segmentation contributed to the classification results, but studies are needed for the improvement of these stages of automated diagnosis. The original, gold standard, normalized, and segmented images are publicly available and may be used for the improvement of clinical applications of CAD methods on oral epithelial dysplasia tissue images.
早期诊断潜在恶性疾病,如口腔上皮异型增生,是预防口腔癌最可靠的方法。计算算法已被用作辅助工具,以帮助专家在这个过程中。通常,实验是在私人数据上进行的,这使得很难复制结果。有几个公共的组织学图像数据集,但专注于口腔异型增生图像的研究使用了无法访问的数据集。这阻止了旨在治疗这种病变的算法的改进。本研究介绍了一个公共的口腔上皮异型增生组织图像标注数据集。该数据集包括从 30 只老鼠舌头上采集的 456 张图像。这些图像根据病变等级进行分类,细胞核结构由经过培训的专家手动标记,并由病理学家验证。此外,还进行了实验,以说明所提出的数据集在文献中常见的分类和分割过程中的潜力。对图像进行了语义和实例分割的卷积神经网络 (CNN) 模型的实验,对图像进行了染色归一化方法的预处理。然后,使用 CNN 架构和机器学习算法对分割和非分割图像进行分类。通过这些过程获得的数据可在数据集中获得。分割阶段使用 U-Net 模型和 ResNet-50 作为骨干网络,获得了 0.83 的 F1 分数值。在分类阶段,随机森林方法取得了最有表现力的结果,准确率为 94.22%。结果表明,分割有助于分类结果,但需要进一步研究以改进自动诊断的这些阶段。原始、金标准、归一化和分割图像是公开的,可用于改进 CAD 方法在口腔上皮异型增生组织图像上的临床应用。