Suppr超能文献

OphGLM:一个眼科大语言与视觉助理。

OphGLM: An ophthalmology large language-and-vision assistant.

机构信息

Shenzhen International Graduate School, Tsinghua University, Shenzhen, China.

Shenzhen International Graduate School, Tsinghua University, Shenzhen, China.

出版信息

Artif Intell Med. 2024 Nov;157:103001. doi: 10.1016/j.artmed.2024.103001. Epub 2024 Oct 22.

Abstract

Vision computer-aided diagnostic methods have been used in early ophthalmic disease screening and diagnosis. However, the limited output formats of these methods lead to poor human-computer interaction and low clinical applicability value. Thus, ophthalmic visual question answering is worth studying. Unfortunately, no practical solutions exist before Large Language Models(LLMs). In this paper, we investigate the ophthalmic visual diagnostic interaction problem. We construct an ophthalmology large language-and-vision assistant, OphGLM, consisting of an image encoder, a text encoder, a fusion module, and an LLM module. We establish a new Chinese ophthalmic fine-tuning dataset, FundusTuning-CN, including the fundus instruction and conversation sets. Based on FundusTuning-CN, we establish a novel LLM-tuning strategy to introduce visual model understanding and ophthalmic knowledge into LLMs at a low cost and high efficiency. Leveraging the pre-training of the image encoder, OphGLM demonstrates strong visual understanding and surpasses open-source visual language models in common fundus disease classification tasks. The FundusTuning-CN enables OphGLM to surpass open-source medical LLMs in both ophthalmic knowledge and interactive capabilities. Our proposed OphGLM has the potential to revolutionize clinical applications in ophthalmology. The dataset, code, and models will be publicly available at https://github.com/ML-AILab/OphGLM.

摘要

视觉计算机辅助诊断方法已被用于眼科疾病的早期筛查和诊断。然而,这些方法的有限输出格式导致人机交互不佳,临床应用价值低。因此,眼科视觉问答值得研究。不幸的是,在大型语言模型(LLMs)出现之前,没有实际的解决方案。在本文中,我们研究了眼科视觉诊断交互问题。我们构建了一个眼科大语言和视觉助手 OphGLM,它由图像编码器、文本编码器、融合模块和 LLM 模块组成。我们建立了一个新的中文眼科微调数据集 FundusTuning-CN,包括眼底指令和对话集。基于 FundusTuning-CN,我们建立了一种新的 LLM 微调策略,以低成本、高效率地将视觉模型理解和眼科知识引入到 LLM 中。利用图像编码器的预训练,OphGLM 展示了强大的视觉理解能力,并在常见眼底疾病分类任务中超越了开源视觉语言模型。FundusTuning-CN 使 OphGLM 在眼科知识和交互能力方面都超越了开源医学 LLM。我们提出的 OphGLM 有可能彻底改变眼科的临床应用。数据集、代码和模型将在 https://github.com/ML-AILab/OphGLM 上公开。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验