基于DINO V2模型和语义搜索的可解释自监督医学图像诊断学习

Explainable self-supervised learning for medical image diagnosis based on DINO V2 model and semantic search.

作者信息

Hussien Alaa, Elkhateb Abdelkareem, Saeed Mai, Elsabawy Nourhan M, Elnakeeb Alaa Ebraheem, Elrashidy Nora

机构信息

Machine Learning and Information Retrieval Department, Faculty of Artificial Intelligence, Kaferelshikh University, Kaferelshikh, 33511, Egypt.

Biological Artificial Intelligence Program, Faculty of Artificial Intelligence, Kafer Elsheikh University, Kaferelshikh, Egypt.

出版信息

Sci Rep. 2025 Sep 1;15(1):32174. doi: 10.1038/s41598-025-15604-6.

DOI:10.1038/s41598-025-15604-6

PMID:40890188

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12402508/

Abstract

Medical images have become indispensable for decision-making and significantly affect treatment planning. However, increasing medical imaging has widened the gap between medical images and available radiologists, leading to delays and diagnosis errors. Recent studies highlight the potential of deep learning (DL) in medical image diagnosis. However, their reliance on labelled data limits their applicability in various clinical settings. As a result, recent studies explore the role of self-supervised learning to overcome these challenges. Our study aims to address these challenges by examining the performance of self-supervised learning (SSL) in diverse medical image datasets and comparing it with traditional pre-trained supervised learning models. Unlike prior SSL methods that focus solely on classification, our framework leverages DINOv2's embeddings to enable semantic search in medical databases (via Qdrant), allowing clinicians to retrieve similar cases efficiently. This addresses a critical gap in clinical workflows where rapid case The results affirmed SSL's ability, especially DINO v2, to overcome the challenge associated with labelling data and provide an accurate diagnosis superior to traditional SL. DINO V2 provides 100%, 99%, 99%, 100 and 95% for classification accuracy of Lung cancer, brain tumour, leukaemia and Eye Retina Disease datasets, respectively. While existing SSL models (e.g., BYOL, SimCLR) lack interpretability, we uniquely combine DINOv2 with ViT-CX, a causal explanation method tailored for transformers. This provides clinically actionable heatmaps, revealing how the model localizes tumors/cellular patternsa feature absent in prior SSL medical imaging studies Furthermore, our research explores the impact of semantic search in the medical images domain and how it can revolutionize the querying process and provide semantic results alongside SSL and the Qudra Net dataset utilized to save the embedding of the developed model after the training process. Cosine similarity measures the distance between the image query and stored information in the embedding using cosine similarity. Our study aims to enhance the efficiency and accuracy of medical image analysis, ultimately improving the decision-making process.

摘要

医学图像已成为决策过程中不可或缺的一部分，并对治疗方案的制定产生重大影响。然而，医学成像的不断增加扩大了医学图像数量与现有放射科医生数量之间的差距，导致诊断延误和错误。最近的研究突出了深度学习（DL）在医学图像诊断中的潜力。然而，它们对标记数据的依赖限制了其在各种临床环境中的适用性。因此，最近的研究探索了自监督学习在克服这些挑战方面的作用。我们的研究旨在通过检验自监督学习（SSL）在不同医学图像数据集上的性能，并将其与传统的预训练监督学习模型进行比较，来应对这些挑战。与以往仅专注于分类的SSL方法不同，我们的框架利用DINOv2的嵌入来实现医学数据库中的语义搜索（通过Qdrant），使临床医生能够高效检索相似病例。这填补了临床工作流程中的一个关键空白，即在快速查找病例时……结果证实了SSL的能力，尤其是DINO v2，能够克服与标记数据相关的挑战，并提供优于传统监督学习的准确诊断。DINO V2在肺癌、脑肿瘤、白血病和视网膜疾病数据集的分类准确率分别为100%、99%、99%、100%和95%。虽然现有的SSL模型（如BYOL、SimCLR）缺乏可解释性，但我们独特地将DINOv2与ViT-CX相结合，ViT-CX是一种专门为Transformer量身定制的因果解释方法。这提供了具有临床可操作性的热图，揭示了模型如何定位肿瘤/细胞模式——这是以往SSL医学成像研究中所缺乏的一个特征。此外，我们的研究探讨了语义搜索在医学图像领域的影响，以及它如何能够彻底改变查询过程，并在SSL和用于在训练过程后保存已开发模型嵌入的Qudra Net数据集的基础上提供语义结果。余弦相似度使用余弦相似度来衡量图像查询与嵌入中存储信息之间的距离。我们的研究旨在提高医学图像分析的效率和准确性，最终改善决策过程。