Kim Sungjin, Chang Yongjun, An Sungjun, Kim Deokseok, Cho Jaegu, Oh Kyungho, Baek Seungkuk, Choi Bo K
Department of Artificial Intelligence, Cheju Halla University, Jeju 63092, Republic of Korea.
Research Lab, MTEG, Seoul 03920, Republic of Korea.
Cancers (Basel). 2024 Oct 14;16(20):3482. doi: 10.3390/cancers16203482.
This study modifies the U-Net architecture for pixel-based segmentation to automatically classify lesions in laryngeal endoscopic images. The advanced U-Net incorporates five-level encoders and decoders, with an autoencoder layer to derive latent vectors representing the image characteristics. To enhance performance, a WGAN was implemented to address common issues such as mode collapse and gradient explosion found in traditional GANs. The dataset consisted of 8171 images labeled with polygons in seven colors. Evaluation metrics, including the F1 score and intersection over union, revealed that benign tumors were detected with lower accuracy compared to other lesions, while cancers achieved notably high accuracy. The model demonstrated an overall accuracy rate of 99%. This enhanced U-Net model shows strong potential in improving cancer detection, reducing diagnostic errors, and enhancing early diagnosis in medical applications.
本研究修改了基于像素分割的U-Net架构,以自动对喉镜图像中的病变进行分类。先进的U-Net包含五级编码器和解码器,以及一个自动编码器层,用于导出表示图像特征的潜在向量。为了提高性能,实施了WGAN来解决传统GAN中常见的模式崩溃和梯度爆炸等问题。该数据集由8171张用七种颜色的多边形标注的图像组成。评估指标,包括F1分数和交并比,显示与其他病变相比,良性肿瘤的检测准确率较低,而癌症的检测准确率显著较高。该模型的总体准确率为99%。这种增强的U-Net模型在改善癌症检测、减少诊断错误以及在医疗应用中加强早期诊断方面显示出强大的潜力。