Wen Chi, Ye Mang, Li He, Chen Ting, Xiao Xuan
IEEE Trans Med Imaging. 2025 Jan;44(1):57-68. doi: 10.1109/TMI.2024.3429148. Epub 2025 Jan 2.
Existing deep learning methods have achieved remarkable results in diagnosing retinal diseases, showcasing the potential of advanced AI in ophthalmology. However, the black-box nature of these methods obscures the decision-making process, compromising their trustworthiness and acceptability. Inspired by the concept-based approaches and recognizing the intrinsic correlation between retinal lesions and diseases, we regard retinal lesions as concepts and propose an inherently interpretable framework designed to enhance both the performance and explainability of diagnostic models. Leveraging the transformer architecture, known for its proficiency in capturing long-range dependencies, our model can effectively identify lesion features. By integrating with image-level annotations, it achieves the alignment of lesion concepts with human cognition under the guidance of a retinal foundation model. Furthermore, to attain interpretability without losing lesion-specific information, our method employs a classifier built on a cross-attention mechanism for disease diagnosis and explanation, where explanations are grounded in the contributions of human-understandable lesion concepts and their visual localization. Notably, due to the structure and inherent interpretability of our model, clinicians can implement concept-level interventions to correct the diagnostic errors by simply adjusting erroneous lesion predictions. Experiments conducted on four fundus image datasets demonstrate that our method achieves favorable performance against state-of-the-art methods while providing faithful explanations and enabling concept-level interventions. Our code is publicly available at https://github.com/Sorades/CLAT.
现有的深度学习方法在视网膜疾病诊断方面取得了显著成果,展示了先进人工智能在眼科领域的潜力。然而,这些方法的黑箱性质掩盖了决策过程,损害了它们的可信度和可接受性。受基于概念的方法启发,并认识到视网膜病变与疾病之间的内在关联,我们将视网膜病变视为概念,并提出了一个本质上可解释的框架,旨在提高诊断模型的性能和可解释性。利用以擅长捕捉长程依赖关系而闻名的Transformer架构,我们的模型能够有效识别病变特征。通过与图像级注释相结合,它在视网膜基础模型的指导下实现了病变概念与人类认知的对齐。此外,为了在不丢失病变特定信息的情况下实现可解释性,我们的方法采用了基于交叉注意力机制构建的分类器进行疾病诊断和解释,其中解释基于人类可理解的病变概念及其视觉定位的贡献。值得注意的是,由于我们模型的结构和固有的可解释性,临床医生可以通过简单地调整错误的病变预测来实施概念级干预以纠正诊断错误。在四个眼底图像数据集上进行的实验表明,我们的方法在提供可靠解释并实现概念级干预的同时,相对于现有方法取得了良好的性能。我们的代码可在https://github.com/Sorades/CLAT上公开获取。