• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于视网膜图像分析中视觉语言模型训练的专业课程。

Specialized curricula for training vision language models in retinal image analysis.

作者信息

Holland Robbie, Taylor Thomas R P, Holmes Christopher, Riedl Sophie, Mai Julia, Patsiamanidi Maria, Mitsopoulou Dimitra, Hager Paul, Müller Philip, Paetzold Johannes C, Scholl Hendrik P N, Bogunović Hrvoje, Schmidt-Erfurth Ursula, Rueckert Daniel, Sivaprasad Sobha, Lotery Andrew J, Menten Martin J

机构信息

Biomedical Image Analysis, Department of Computing, Imperial College London, London, United Kingdom.

Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom.

出版信息

NPJ Digit Med. 2025 Aug 19;8(1):532. doi: 10.1038/s41746-025-01893-8.

DOI:10.1038/s41746-025-01893-8
PMID:40830259
Abstract

Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI's ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM's reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.

摘要

临床医生花费大量时间查看医学影像并记录检查结果。通过整合视觉和文本数据,基础模型有潜力减轻工作量并提高效率,但其实际临床价值仍不确定。在本研究中,我们发现OpenAI的ChatGPT-4o和两个医学视觉语言模型(VLM)在年龄相关性黄斑变性(AMD)的关键任务中表现明显不如眼科医生。为解决这一问题,我们开发了由领域专家设计的专门培训课程,以优化VLM用于与临床决策相关的任务。由此产生的模型RetinaVLM-Specialist在AMD疾病分期(F1:0.63对0.33)和转诊(0.67对0.50)方面显著优于基础医学VLM和ChatGPT-4o,其表现与初级眼科医生相当。在一项读者研究中,两位资深眼科医生证实RetinaVLM的报告比ChatGPT-4o编写的报告准确得多(64.3%对14.3%)。总体而言,我们基于课程的方法为使基础模型适应实际医疗应用提供了一个蓝图。

相似文献

1
Specialized curricula for training vision language models in retinal image analysis.用于视网膜图像分析中视觉语言模型训练的专业课程。
NPJ Digit Med. 2025 Aug 19;8(1):532. doi: 10.1038/s41746-025-01893-8.
2
Vision-language model performance on the Japanese Nuclear Medicine Board Examination: high accuracy in text but challenges with image interpretation.视觉语言模型在日本核医学委员会考试中的表现:文本准确率高,但图像解读存在挑战。
Ann Nucl Med. 2025 Jul 15. doi: 10.1007/s12149-025-02084-x.
3
Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study.通过在日本外科医师资格考试中使用纯文本和图文并茂的问题评估GPT-4和GPT-4o来研究人工智能在外科培训中的作用:性能评估研究
JMIR Med Educ. 2025 Jul 30;11:e69313. doi: 10.2196/69313.
4
Evaluation of Vision-Language Models for Detection and Deidentification of Medical Images with Burned-In Protected Health Information.用于检测和去识别带有预嵌入受保护健康信息的医学图像的视觉语言模型评估
Radiology. 2025 Jun;315(3):e243664. doi: 10.1148/radiol.243664.
5
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
6
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.
7
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
8
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.
9
Anti-vascular endothelial growth factor biosimilars for neovascular age-related macular degeneration.抗血管内皮生长因子生物类似药治疗新生血管性年龄相关性黄斑变性。
Cochrane Database Syst Rev. 2024 Jun 3;6(6):CD015804. doi: 10.1002/14651858.CD015804.pub2.
10
Large language models (LLMs) in radiology exams for medical students: Performance and consequences.面向医学生的放射学考试中的大语言模型:表现与影响。
Rofo. 2024 Nov 4. doi: 10.1055/a-2437-2067.

引用本文的文献

1
Compact Vision-Language Models Enable Efficient and Interpretable Automated OCT Analysis Through Layer Specific Multimodal Learning.紧凑型视觉语言模型通过特定层多模态学习实现高效且可解释的自动化光学相干断层扫描分析。
bioRxiv. 2025 Aug 11:2025.08.07.669187. doi: 10.1101/2025.08.07.669187.

本文引用的文献

1
OphGLM: An ophthalmology large language-and-vision assistant.OphGLM:一个眼科大语言与视觉助理。
Artif Intell Med. 2024 Nov;157:103001. doi: 10.1016/j.artmed.2024.103001. Epub 2024 Oct 22.
2
Metadata-enhanced contrastive learning from retinal optical coherence tomography images.基于视网膜光学相干层析成像的元数据增强对比学习。
Med Image Anal. 2024 Oct;97:103296. doi: 10.1016/j.media.2024.103296. Epub 2024 Aug 10.
3
A visual-language foundation model for computational pathology.用于计算病理学的视觉-语言基础模型。
Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.
4
A foundation model for generalizable disease detection from retinal images.基于视网膜图像的通用疾病检测的基础模型。
Nature. 2023 Oct;622(7981):156-163. doi: 10.1038/s41586-023-06555-x. Epub 2023 Sep 13.
5
A visual-language foundation model for pathology image analysis using medical Twitter.一种使用医学推特进行病理学图像分析的视觉语言基础模型。
Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.
6
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
7
The Current and Future State of AI Interpretation of Medical Images.医学图像人工智能解读的现状与未来发展态势
N Engl J Med. 2023 May 25;388(21):1981-1990. doi: 10.1056/NEJMra2301725.
8
Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。
Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.
9
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
10
Multimodal biomedical AI.多模态生物医学人工智能。
Nat Med. 2022 Sep;28(9):1773-1784. doi: 10.1038/s41591-022-01981-2. Epub 2022 Sep 15.