基于混合注意力的深度学习用于眼底图像多标签眼科疾病检测

Hybrid attention-based deep learning for multi-label ophthalmic disease detection on fundus images.

作者信息

Hanfi Rabiya, Mathur Harsh, Shrivastava Ritu

机构信息

Department of Computer Science & Engineering, Rabindranath Tagore University, Bhopal, India.

Department of Computer Science & Engineering, Sagar Institute of Research and Technology, Bhopal, India.

出版信息

Graefes Arch Clin Exp Ophthalmol. 2025 May 29. doi: 10.1007/s00417-025-06858-x.

DOI:10.1007/s00417-025-06858-x

PMID:40439748

Abstract

BACKGROUND

Ophthalmic diseases significantly impact vision and quality of life. Early diagnosis using fundus images is critical for timely treatment. Traditional deep learning models often lack accuracy, interpretability, and efficiency for multi-label classification tasks in ophthalmology.

METHODS

We propose HAM-DNet, a hybrid deep learning model combining EfficientNetV2 and Vision Transformers (ViT) for multi-label ophthalmic disease detection. The model includes SE (Squeeze-and-Excitation) blocks for attention-based feature refinement and a U-Net-based lesion localization module for improved interpretability. The model was trained and tested on multiple fundus image datasets (ODIR-5 K, Messidor, G1020, and Joint Shantou International Eye Centre).

RESULTS

HAM-DNet achieved superior performance with an accuracy of 95.3%, precision of 96.2%, recall of 97.1%, AUC of 98.42, and F1-score of 96.75, while maintaining low computational cost (9.7 GFLOPS). It outperformed existing models including Shallow CNN and EfficientNet, particularly in handling multi-label classifications and reducing false positives and negatives.

CONCLUSIONS

HAM-DNet offers a robust, accurate, and interpretable solution for automated detection of multiple ophthalmic diseases. Its lightweight architecture makes it suitable for clinical deployment, especially in telemedicine and resource-constrained environments.

摘要

背景

眼科疾病对视力和生活质量有显著影响。利用眼底图像进行早期诊断对于及时治疗至关重要。传统的深度学习模型在眼科多标签分类任务中往往缺乏准确性、可解释性和效率。

方法

我们提出了HAM-DNet，一种结合EfficientNetV2和视觉Transformer（ViT）的混合深度学习模型，用于多标签眼科疾病检测。该模型包括用于基于注意力的特征细化的SE（挤压与激励）块和用于提高可解释性的基于U-Net的病变定位模块。该模型在多个眼底图像数据集（ODIR-5K、Messidor、G1020和汕头国际眼科中心联合数据集）上进行了训练和测试。

结果

HAM-DNet取得了优异的性能，准确率为95.3%，精确率为96.2%，召回率为97.1%，AUC为98.42，F1分数为96.75，同时保持了较低的计算成本（9.7 GFLOPS）。它优于包括浅层卷积神经网络和EfficientNet在内的现有模型，特别是在处理多标签分类以及减少误报和漏报方面。

结论

HAM-DNet为多种眼科疾病的自动检测提供了一种强大、准确且可解释的解决方案。其轻量级架构使其适用于临床部署，尤其是在远程医疗和资源受限的环境中。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于混合注意力的深度学习用于眼底图像多标签眼科疾病检测

Hybrid attention-based deep learning for multi-label ophthalmic disease detection on fundus images.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

基于混合注意力的深度学习用于眼底图像多标签眼科疾病检测

Hybrid attention-based deep learning for multi-label ophthalmic disease detection on fundus images.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献