基于大语言模型的多模态系统，用于从智能手机图像中检测和分级眼表疾病。

Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images.

作者信息

Li Zhongwen, Wang Zhouqian, Xiu Liheng, Zhang Pengyao, Wang Wenfang, Wang Yangyang, Chen Gang, Yang Weihua, Chen Wei

机构信息

Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, China.

National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China.

出版信息

Front Cell Dev Biol. 2025 May 23;13:1600202. doi: 10.3389/fcell.2025.1600202. eCollection 2025.

DOI:10.3389/fcell.2025.1600202

PMID:40486905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12141289/

Abstract

BACKGROUND

The development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to expanding access to quality healthcare.

METHODS

This cross-sectional study developed the Multimodal Ocular Surface Assessment and Interpretation Copilot (MOSAIC) using three multimodal large language models: gpt-4-turbo, claude-3-opus, and gemini-1.5-pro-latest, for detecting three ocular surface diseases (OSDs) and grading keratitis and pterygium. A total of 375 smartphone-captured ocular surface images collected from 290 eyes were utilized to validate MOSAIC. The performance of MOSAIC was evaluated in both zero-shot and few-shot settings, with tasks including image quality control, OSD detection, analysis of the severity of keratitis, and pterygium grading. The interpretability of the system was also evaluated.

RESULTS

MOSAIC achieved 95.00% accuracy in image quality control, 86.96% in OSD detection, 88.33% in distinguishing mild from severe keratitis, and 66.67% in determining pterygium grades with five-shot settings. The performance significantly improved with the increasing learning shots (p < 0.01). The system attained high ROUGE-L F1 scores of 0.70-0.78, depicting its interpretable image comprehension capability.

CONCLUSION

MOSAIC exhibited exceptional few-shot learning capabilities, achieving high accuracy in OSD management with minimal training examples. This system has significant potential for smartphone integration to enhance the accessibility and effectiveness of OSD detection and grading in resource-limited settings.

摘要

背景

医学人工智能（AI）模型的发展主要是由解决医疗资源稀缺问题的需求驱动的，特别是在服务不足的地区。提出一个价格合理、易于使用、可解释且自动化的非临床环境AI系统对于扩大优质医疗服务的可及性至关重要。

方法

这项横断面研究使用三种多模态大语言模型：gpt-4-turbo、claude-3-opus和gemini-1.5-pro-latest，开发了多模态眼表评估与解读助手（MOSAIC），用于检测三种眼表疾病（OSD）以及对角膜炎和翼状胬肉进行分级。总共利用从290只眼睛收集的375张智能手机拍摄的眼表图像来验证MOSAIC。在零样本和少样本设置下评估MOSAIC的性能，任务包括图像质量控制、OSD检测、角膜炎严重程度分析和翼状胬肉分级。还评估了该系统的可解释性。

结果

在五样本设置下，MOSAIC在图像质量控制方面的准确率达到95.00%，在OSD检测方面达到86.96%，在区分轻度与重度角膜炎方面达到88.33%，在确定翼状胬肉分级方面达到66.67%。随着学习样本数量的增加，性能显著提高（p < 0.01）。该系统获得了0.70 - 0.78的高ROUGE-L F1分数，表明其具有可解释的图像理解能力。