• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

紧凑型视觉语言模型通过特定层多模态学习实现高效且可解释的自动化光学相干断层扫描分析。

Compact Vision-Language Models Enable Efficient and Interpretable Automated OCT Analysis Through Layer Specific Multimodal Learning.

作者信息

Haghighi Tania, Gholami Sina, Sokol Jared Todd, Lim Jennifer I, Leng Theodore, Thompson Atalie C, Tabkhi Hamed, Alam Minhaj Nur

机构信息

Department of Electrical and Computer Engineering, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.

Byers Eye Institute at Stanford, Stanford University School of Medicine, Stanford, CA 94305, USA.

出版信息

bioRxiv. 2025 Aug 11:2025.08.07.669187. doi: 10.1101/2025.08.07.669187.

DOI:10.1101/2025.08.07.669187
PMID:40832232
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12363835/
Abstract

Translating the intricate anatomical signatures of retinal disease from OCT B-scans into clear, accurate clinical narratives demands AI models that seamlessly fuse visual features with domain expertise. We curated a multimodal dataset of 40,000 OCT B-scans from public repositories and private clinical cohorts, each paired with expert validated summaries spanning six conditions: diabetic macular edema, diabetic retinopathy, geographic atrophy, drusen, choroidal neovascularization, and healthy retina. We introduce LO-VLM, a compact (247M parameter) vision-language model (VLM) that infuses anatomical guidance into both encoder and decoder for free form summary generation and multiclass disease classification. Benchmarking against state-of-the-art RetinaVLM, LLaVA-Med, and a ViT vision only model demonstrates superior performance. In a blinded evaluation by three board certified retina specialists scored the generated summaries, LO-VLM narratives achieved mean = 8.5 (standard deviation = 1.15) out of 10, compared to mean = 5.5 (standard deviation = 1.13) for RetinaVLM (p < 0.0001). In quantitative evaluations, LO-VLM achieved an SBERT similarity of 0.803 and a BERTScore F1 of 0.715, representing improvements of 8.2% and 28.8% over specialized VLM baselines. For disease classification, LO-VLM reached 96% accuracy (F1 = 96%), outperforming ViT by 13% and exceeding medical VLM benchmarks by over 62%. By reconciling interpretability with computational efficiency, LO-VLM establishes a new paradigm for efficient AI models in OCT interpretation.

摘要

将视网膜疾病复杂的解剖学特征从光学相干断层扫描(OCT)B 扫描转化为清晰、准确的临床描述,需要能够将视觉特征与领域专业知识无缝融合的人工智能模型。我们从公共存储库和私人临床队列中精心策划了一个包含 40000 张 OCT B 扫描的多模态数据集,每个扫描都与专家验证的涵盖六种病症的总结配对:糖尿病性黄斑水肿、糖尿病视网膜病变、地图样萎缩、玻璃膜疣、脉络膜新生血管和健康视网膜。我们引入了 LO-VLM,这是一种紧凑的(247M 参数)视觉语言模型(VLM),它将解剖学指导融入编码器和解码器,以生成自由形式的总结和多类疾病分类。与最先进的 RetinaVLM、LLaVA-Med 和仅基于视觉Transformer(ViT)的模型进行基准测试,结果显示 LO-VLM 具有卓越的性能。在由三位获得董事会认证的视网膜专家进行的盲法评估中,对生成的总结进行评分,LO-VLM 的描述平均得分为 8.5(标准差 = 1.15)(满分 10 分),而 RetinaVLM 的平均得分为 5.5(标准差 = 1.13)(p < 0.0001)。在定量评估中,LO-VLM 的 SBERT 相似度为 0.803,BERTScore F1 为 0.715,分别比专门的 VLM 基线提高了 8.2%和 28.8%。对于疾病分类,LO-VLM 的准确率达到 96%(F1 = 96%),比 ViT 高出 13%,超过医学 VLM 基准超过 62%。通过兼顾可解释性和计算效率,LO-VLM 为 OCT 解读中的高效人工智能模型建立了新的范式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/17aa48f20f3f/nihpp-2025.08.07.669187v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/e11dbcf41f22/nihpp-2025.08.07.669187v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/905079235917/nihpp-2025.08.07.669187v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/4076046057c5/nihpp-2025.08.07.669187v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/c2e4165ee1ea/nihpp-2025.08.07.669187v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/46684e7960fc/nihpp-2025.08.07.669187v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/02e6aa4558cf/nihpp-2025.08.07.669187v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/17aa48f20f3f/nihpp-2025.08.07.669187v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/e11dbcf41f22/nihpp-2025.08.07.669187v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/905079235917/nihpp-2025.08.07.669187v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/4076046057c5/nihpp-2025.08.07.669187v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/c2e4165ee1ea/nihpp-2025.08.07.669187v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/46684e7960fc/nihpp-2025.08.07.669187v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/02e6aa4558cf/nihpp-2025.08.07.669187v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/610d/12363835/17aa48f20f3f/nihpp-2025.08.07.669187v1-f0007.jpg

相似文献

1
Compact Vision-Language Models Enable Efficient and Interpretable Automated OCT Analysis Through Layer Specific Multimodal Learning.紧凑型视觉语言模型通过特定层多模态学习实现高效且可解释的自动化光学相干断层扫描分析。
bioRxiv. 2025 Aug 11:2025.08.07.669187. doi: 10.1101/2025.08.07.669187.
2
Menstrual Health Education Using a Specialized Large Language Model in India: Development and Evaluation Study of MenstLLaMA.在印度使用专门的大语言模型进行月经健康教育:MenstLLaMA的开发与评估研究
J Med Internet Res. 2025 Jul 16;27:e71977. doi: 10.2196/71977.
3
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
4
Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy.光学相干断层扫描(OCT)用于检测糖尿病视网膜病变患者的黄斑水肿。
Cochrane Database Syst Rev. 2015 Jan 7;1(1):CD008081. doi: 10.1002/14651858.CD008081.pub3.
5
Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy.光学相干断层扫描(OCT)用于检测糖尿病视网膜病变患者的黄斑水肿。
Cochrane Database Syst Rev. 2011 Jul 6(7):CD008081. doi: 10.1002/14651858.CD008081.pub2.
6
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
7
CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs.CXR-MultiTaskNet:一种用于胸部X光片中疾病联合定位与分类的统一深度学习框架。
Sci Rep. 2025 Aug 31;15(1):32022. doi: 10.1038/s41598-025-16669-z.
8
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
9
Radiology report generation using automatic keyword adaptation, frequency-based multi-label classification and text-to-text large language models.使用自动关键词适配、基于频率的多标签分类和文本到文本的大语言模型生成放射学报告。
Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.
10
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险

本文引用的文献

1
Specialized curricula for training vision language models in retinal image analysis.用于视网膜图像分析中视觉语言模型训练的专业课程。
NPJ Digit Med. 2025 Aug 19;8(1):532. doi: 10.1038/s41746-025-01893-8.
2
VisionUnite: a Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge.VisionUnite:一个通过临床知识增强的眼科视觉语言基础模型。
IEEE Trans Pattern Anal Mach Intell. 2025 Aug 13;PP. doi: 10.1109/TPAMI.2025.3598734.
3
EYE-Llama, an in-domain large language model for ophthalmology.EYE-Llama,一种用于眼科领域的大语言模型。
iScience. 2025 Jun 23;28(7):112984. doi: 10.1016/j.isci.2025.112984. eCollection 2025 Jul 18.
4
Distributed training of foundation models for ophthalmic diagnosis.用于眼科诊断的基础模型的分布式训练。
Commun Eng. 2025 Jan 22;4(1):6. doi: 10.1038/s44172-025-00341-5.
5
Collaboration between clinicians and vision-language models in radiology report generation.临床医生与视觉语言模型在放射学报告生成中的协作。
Nat Med. 2025 Feb;31(2):599-608. doi: 10.1038/s41591-024-03302-1. Epub 2024 Nov 7.
6
A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision.视网膜的基础语言-图像模型(FLAIR):在文本监督中编码专家知识。
Med Image Anal. 2025 Jan;99:103357. doi: 10.1016/j.media.2024.103357. Epub 2024 Oct 1.
7
Automated classification of choroidal neovascularization, diabetic macular edema, and drusen from retinal OCT images using vision transformers: a comparative study.基于视觉转换器的视网膜 OCT 图像脉络膜新生血管、糖尿病性黄斑水肿和玻璃膜疣的自动分类:一项比较研究。
Lasers Med Sci. 2024 May 27;39(1):140. doi: 10.1007/s10103-024-04089-w.
8
OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods.OCTDL:基于图像的深度学习方法的光学相干层析成像数据集。
Sci Data. 2024 Apr 11;11(1):365. doi: 10.1038/s41597-024-03182-7.
9
A foundation model for generalizable disease detection from retinal images.基于视网膜图像的通用疾病检测的基础模型。
Nature. 2023 Oct;622(7981):156-163. doi: 10.1038/s41586-023-06555-x. Epub 2023 Sep 13.
10
DeepRetina: Layer Segmentation of Retina in OCT Images Using Deep Learning.DeepRetina:基于深度学习的 OCT 图像视网膜层分割。
Transl Vis Sci Technol. 2020 Dec 9;9(2):61. doi: 10.1167/tvst.9.2.61. eCollection 2020 Dec.