Yuan Weicheng, Du Zeyu, Han Shuo
College of Basic Medicine, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China.
School of Health Science, University of Manchester, Sackville Street, Manchester, 610101, England, UK.
Discov Oncol. 2024 May 22;15(1):180. doi: 10.1007/s12672-024-01043-8.
Worldwide, skin cancer prevalence necessitates accurate diagnosis to alleviate public health burdens. Although the application of artificial intelligence in image analysis and pattern recognition has improved the accuracy and efficiency of early skin cancer diagnosis, existing supervised learning methods are limited due to their reliance on a large amount of labeled data. To overcome the limitations of data labeling and enhance the performance of diagnostic models, this study proposes a semi-supervised skin cancer diagnostic model based on Self-feedback Threshold Focal Learning (STFL), capable of utilizing partial labeled and a large scale of unlabeled medical images for training models in unseen scenarios. The proposed model dynamically adjusts the selection threshold of unlabeled samples during training, effectively filtering reliable unlabeled samples and using focal learning to mitigate the impact of class imbalance in further training. The study is experimentally validated on the HAM10000 dataset, which includes images of various types of skin lesions, with experiments conducted across different scales of labeled samples. With just 500 annotated samples, the model demonstrates robust performance (0.77 accuracy, 0.6408 Kappa, 0.77 recall, 0.7426 precision, and 0.7462 F1-score), showcasing its efficiency with limited labeled data. Further, comprehensive testing validates the semi-supervised model's significant advancements in diagnostic accuracy and efficiency, underscoring the value of integrating unlabeled data. This model offers a new perspective on medical image processing and contributes robust scientific support for the early diagnosis and treatment of skin cancer.
在全球范围内,皮肤癌的流行情况使得准确诊断成为减轻公共卫生负担的必要条件。尽管人工智能在图像分析和模式识别中的应用提高了早期皮肤癌诊断的准确性和效率,但现有的监督学习方法由于依赖大量标记数据而受到限制。为了克服数据标记的局限性并提高诊断模型的性能,本研究提出了一种基于自反馈阈值焦点学习(STFL)的半监督皮肤癌诊断模型,该模型能够利用部分标记的和大规模未标记的医学图像在未知场景中训练模型。所提出的模型在训练过程中动态调整未标记样本的选择阈值,有效过滤可靠的未标记样本,并使用焦点学习来减轻进一步训练中类别不平衡的影响。该研究在HAM10000数据集上进行了实验验证,该数据集包括各种类型皮肤病变的图像,并在不同规模的标记样本上进行了实验。仅使用500个注释样本,该模型就展示了强大的性能(准确率0.77、卡帕值0.6408、召回率0.77、精确率0.7426和F1分数0.7462),展示了其在有限标记数据下的效率。此外,全面测试验证了半监督模型在诊断准确性和效率方面的显著进步,强调了整合未标记数据的价值。该模型为医学图像处理提供了新的视角,并为皮肤癌的早期诊断和治疗提供了有力的科学支持。