Suppr超能文献

图像的回波:视觉Transformer 中用于图像检索的多损失网络。

Echoes of images: multi-loss network for image retrieval in vision transformers.

机构信息

Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India.

Department of Electrical Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India.

出版信息

Med Biol Eng Comput. 2024 Jul;62(7):2037-2058. doi: 10.1007/s11517-024-03055-6. Epub 2024 Mar 4.

Abstract

This paper introduces a novel approach to enhance content-based image retrieval, validated on two benchmark datasets: ISIC-2017 and ISIC-2018. These datasets comprise skin lesion images that are crucial for innovations in skin cancer diagnosis and treatment. We advocate the use of pre-trained Vision Transformer (ViT), a relatively uncharted concept in the realm of image retrieval, particularly in medical scenarios. In contrast to the traditionally employed Convolutional Neural Networks (CNNs), our findings suggest that ViT offers a more comprehensive understanding of the image context, essential in medical imaging. We further incorporate a weighted multi-loss function, delving into various losses such as triplet loss, distillation loss, contrastive loss, and cross-entropy loss. Our exploration investigates the most resilient combination of these losses to create a robust multi-loss function, thus enhancing the robustness of the learned feature space and ameliorating the precision and recall in the retrieval process. Instead of using all the loss functions, the proposed multi-loss function utilizes the combination of only cross-entropy loss, triplet loss, and distillation loss and gains improvement of 6.52% and 3.45% for mean average precision over ISIC-2017 and ISIC-2018. Another innovation in our methodology is a two-branch network strategy, which concurrently boosts image retrieval and classification. Through our experiments, we underscore the effectiveness and the pitfalls of diverse loss configurations in image retrieval. Furthermore, our approach underlines the advantages of retrieval-based classification through majority voting rather than relying solely on the classification head, leading to enhanced prediction for melanoma - the most lethal type of skin cancer. Our results surpass existing state-of-the-art techniques on the ISIC-2017 and ISIC-2018 datasets by improving mean average precision by 1.01% and 4.36% respectively, emphasizing the efficacy and promise of Vision Transformers paired with our tailor-made weighted loss function, especially in medical contexts. The proposed approach's effectiveness is substantiated through thorough ablation studies and an array of quantitative and qualitative outcomes. To promote reproducibility and support forthcoming research, our source code will be accessible on GitHub.

摘要

本文提出了一种新颖的方法来增强基于内容的图像检索,该方法在两个基准数据集上进行了验证:ISIC-2017 和 ISIC-2018。这些数据集包含皮肤病变图像,对于皮肤癌诊断和治疗的创新至关重要。我们提倡使用预训练的 Vision Transformer(ViT),这是图像检索领域,特别是在医学场景中相对未知的概念。与传统使用的卷积神经网络(CNNs)相比,我们的发现表明 ViT 提供了对图像上下文的更全面理解,这在医学成像中至关重要。我们进一步引入了加权多损失函数,深入研究了各种损失,如三元组损失、蒸馏损失、对比损失和交叉熵损失。我们的探索研究了这些损失中最具弹性的组合,以创建一个强大的多损失函数,从而增强学习特征空间的鲁棒性,并改善检索过程中的精度和召回率。我们提出的多损失函数不是使用所有的损失函数,而是仅使用交叉熵损失、三元组损失和蒸馏损失的组合,并在 ISIC-2017 和 ISIC-2018 上分别获得了 6.52%和 3.45%的平均准确率提高。我们方法的另一个创新是使用了双分支网络策略,该策略同时提高了图像检索和分类的性能。通过实验,我们强调了在图像检索中不同损失配置的有效性和陷阱。此外,我们的方法通过基于多数投票的检索分类强调了基于检索的分类的优势,而不是仅依赖分类头,从而提高了对黑色素瘤(最致命的皮肤癌类型)的预测。我们的结果在 ISIC-2017 和 ISIC-2018 数据集上超过了现有的最先进技术,分别提高了平均准确率 1.01%和 4.36%,强调了 Vision Transformers 与我们定制的加权损失函数相结合的有效性和潜力,特别是在医学背景下。通过彻底的消融研究和一系列定量和定性结果,证明了所提出方法的有效性。为了促进可重复性并支持未来的研究,我们的源代码将在 GitHub 上提供。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验