Suppr超能文献

用于视网膜图像分析中视觉语言模型训练的专业课程。

Specialized curricula for training vision language models in retinal image analysis.

作者信息

Holland Robbie, Taylor Thomas R P, Holmes Christopher, Riedl Sophie, Mai Julia, Patsiamanidi Maria, Mitsopoulou Dimitra, Hager Paul, Müller Philip, Paetzold Johannes C, Scholl Hendrik P N, Bogunović Hrvoje, Schmidt-Erfurth Ursula, Rueckert Daniel, Sivaprasad Sobha, Lotery Andrew J, Menten Martin J

机构信息

Biomedical Image Analysis, Department of Computing, Imperial College London, London, United Kingdom.

Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom.

出版信息

NPJ Digit Med. 2025 Aug 19;8(1):532. doi: 10.1038/s41746-025-01893-8.

Abstract

Clinicians spend significant time reviewing medical images and transcribing findings. By integrating visual and textual data, foundation models have the potential to reduce workloads and boost efficiency, yet their practical clinical value remains uncertain. In this study, we find that OpenAI's ChatGPT-4o and two medical vision-language models (VLMs) significantly underperform ophthalmologists in key tasks for age-related macular degeneration (AMD). To address this, we developed a dedicated training curriculum, designed by domain specialists, to optimize VLMs for tasks related to clinical decision making. The resulting model, RetinaVLM-Specialist, significantly outperforms foundation medical VLMs and ChatGPT-4o in AMD disease staging (F1: 0.63 vs. 0.33) and referral (0.67 vs. 0.50), achieving performance comparable to junior ophthalmologists. In a reader study, two senior ophthalmologists confirmed that RetinaVLM's reports were substantially more accurate than those written by ChatGPT-4o (64.3% vs. 14.3%). Overall, our curriculum-based approach offers a blueprint for adapting foundation models to real-world medical applications.

摘要

临床医生花费大量时间查看医学影像并记录检查结果。通过整合视觉和文本数据,基础模型有潜力减轻工作量并提高效率,但其实际临床价值仍不确定。在本研究中,我们发现OpenAI的ChatGPT-4o和两个医学视觉语言模型(VLM)在年龄相关性黄斑变性(AMD)的关键任务中表现明显不如眼科医生。为解决这一问题,我们开发了由领域专家设计的专门培训课程,以优化VLM用于与临床决策相关的任务。由此产生的模型RetinaVLM-Specialist在AMD疾病分期(F1:0.63对0.33)和转诊(0.67对0.50)方面显著优于基础医学VLM和ChatGPT-4o,其表现与初级眼科医生相当。在一项读者研究中,两位资深眼科医生证实RetinaVLM的报告比ChatGPT-4o编写的报告准确得多(64.3%对14.3%)。总体而言,我们基于课程的方法为使基础模型适应实际医疗应用提供了一个蓝图。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验