Iinuma Kimi, Fujii Kazuyasu, Nakashima Chisa, Kasai Kenichiro, Irie Hiroyuki, Kanetomo Hitonari, Yanagihara Shigeto, Sato Sayuri, Uhara Hisashi, Takeda Fumiaki, Otsuka Atsushi
Dermatology, Kindai University Hospital, Osaka, JPN.
Plastic Surgery, Kasai Clinic for Plastic Surgery, Osaka, JPN.
Cureus. 2025 Jul 24;17(7):e88711. doi: 10.7759/cureus.88711. eCollection 2025 Jul.
Pigmented skin lesions span benign to malignant entities that often appear similar on standard clinical photographs, complicating accurate diagnosis without specialized imaging. Recently, multimodal large language models (MMLLMs) have attracted attention as image-based diagnostic aids and hold promise as decision-support tools in resource-limited settings where dermoscopy may be unavailable.
This study aimed to determine whether a fine-tuned MMLLM can accurately classify eight common pigmented skin conditions using only clinical photographs, thereby providing a non-dermoscopic diagnostic support tool.
We fine-tuned InstructBLIP-flan-t5-xl (Salesforce, San Francisco, CA) using Hugging Face's Seq2SeqTrainer (Hugging Face Inc., New York City, NY) on a curated dataset of 979 manually cropped regions of interest depicting one of eight lesion types (acquired dermal melanocytosis, basal cell carcinoma, ephelis, malignant melanoma, melasma, nevus, seborrheic keratosis, or solar lentigo). Images were split 80% for training and 20% for validation. During training, lesion labels were masked to encourage learning of visual-text correlations. Model performance was evaluated by macro-average sensitivity, specificity, F1 score, and area under the receiver operating characteristic area under the curve (ROC AUC) for each class.
On the validation set, the model achieved a macro-average sensitivity of 86.0%, specificity of 98.2%, and F1 score of 0.86. ROC AUC exceeded 0.95 for six of eight classes. Malignant melanoma showed the highest performance (sensitivity 94%, ROC AUC 0.98), while nevus exhibited the lowest sensitivity (78%, ROC AUC 0.89).
Fine-tuned MMLLMs can accurately classify common pigmented skin lesions from clinical photographs alone, enabling rapid diagnostic support in environments lacking dermoscopy. Future work should expand dataset diversity, undertake multicenter validation, and assess real-world clinical utility to confirm broader applicability.
色素沉着性皮肤病变涵盖了从良性到恶性的多种类型,在标准临床照片上它们通常看起来相似,这使得在没有专业成像技术的情况下准确诊断变得复杂。最近,多模态大语言模型(MMLLMs)作为基于图像的诊断辅助工具受到了关注,并有望在资源有限且可能无法进行皮肤镜检查的环境中作为决策支持工具。
本研究旨在确定经过微调的MMLLM是否能够仅使用临床照片准确分类八种常见的色素沉着性皮肤疾病,从而提供一种非皮肤镜诊断支持工具。
我们使用Hugging Face的Seq2SeqTrainer(Hugging Face公司,纽约市)在一个精心策划的数据集上对InstructBLIP-flan-t5-xl(Salesforce,旧金山,加利福尼亚州)进行微调,该数据集包含979个手动裁剪的感兴趣区域,描绘了八种病变类型之一(后天性真皮黑素细胞增多症、基底细胞癌、雀斑、恶性黑色素瘤、黄褐斑、痣、脂溢性角化病或日光性雀斑样痣)。图像按80%用于训练和20%用于验证进行划分。在训练过程中,病变标签被屏蔽以促进视觉-文本相关性的学习。通过每个类别的宏观平均灵敏度、特异性、F1分数以及受试者操作特征曲线下面积(ROC AUC)来评估模型性能。
在验证集上,该模型的宏观平均灵敏度为86.0%,特异性为98.2%,F1分数为0.86。八个类别中有六个类别的ROC AUC超过0.95。恶性黑色素瘤表现出最高的性能(灵敏度94%,ROC AUC 0.98),而痣的灵敏度最低(78%,ROC AUC 0.89)。
经过微调的MMLLMs能够仅根据临床照片准确分类常见的色素沉着性皮肤病变,从而在缺乏皮肤镜检查的环境中提供快速诊断支持。未来的工作应扩大数据集的多样性,进行多中心验证,并评估实际临床效用以确认更广泛的适用性。