Das Adriteyo, Agarwal Vedant, Shetty Nisha P
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
Department of Humanities and Management, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
Front Artif Intell. 2025 Aug 18;8:1608837. doi: 10.3389/frai.2025.1608837. eCollection 2025.
BACKGROUND/INTRODUCTION: Skin lesion classification poses a critical diagnostic challenge in dermatology, where early and accurate identification has a direct impact on patient outcomes. While deep learning approaches have shown promise using dermatoscopic images alone, the integration of clinical metadata remains underexplored despite its potential to enhance diagnostic accuracy.
We developed a novel multimodal data fusion framework that systematically integrates dermatoscopic images with clinical metadata for the classification of skin lesions. Using the HAM10000 dataset, we evaluated multiple fusion strategies, including simple concatenation, weighted concatenation, self-attention mechanisms, and cross-attention fusion. Clinical features were processed through a customized Multi-Layer Perceptron (MLP), while images were analyzed using a modified Residual Networks (ResNet) architecture. Model interpretability was enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM) visualization to identify the contribution of clinical attributes to classification decisions.
Cross-attention fusion achieved the highest classification accuracy, demonstrating superior performance compared to unimodal approaches and simpler fusion techniques. The multimodal framework significantly outperformed image-only baselines, with cross-attention effectively capturing inter-modal dependencies and contextual relationships between visual and clinical data modalities.
DISCUSSION/CONCLUSIONS: Our findings demonstrate that integrating clinical metadata with dermatoscopic images substantially improves the accuracy of skin lesion classification. However, challenges, including class imbalance and the computational complexity of advanced fusion methods, require further investigation.
背景/引言:皮肤病变分类是皮肤科诊断中的一项关键挑战,早期准确识别对患者预后有直接影响。虽然深度学习方法仅使用皮肤镜图像已显示出前景,但临床元数据的整合尽管有提高诊断准确性的潜力,却仍未得到充分探索。
我们开发了一种新颖的多模态数据融合框架,该框架系统地将皮肤镜图像与临床元数据整合用于皮肤病变分类。使用HAM10000数据集,我们评估了多种融合策略,包括简单拼接、加权拼接、自注意力机制和交叉注意力融合。临床特征通过定制的多层感知器(MLP)进行处理,而图像则使用改进的残差网络(ResNet)架构进行分析。使用梯度加权类激活映射(Grad-CAM)可视化来增强模型可解释性,以确定临床属性对分类决策的贡献。
交叉注意力融合实现了最高的分类准确率,与单模态方法和更简单的融合技术相比表现出卓越性能。多模态框架显著优于仅基于图像的基线,交叉注意力有效地捕捉了视觉和临床数据模态之间的模态间依赖性和上下文关系。
讨论/结论:我们的研究结果表明,将临床元数据与皮肤镜图像相结合可大幅提高皮肤病变分类的准确性。然而,包括类别不平衡和先进融合方法的计算复杂性等挑战仍需进一步研究。