用于增强眼科决策的多模态推理智能体：一项初步的真实世界临床验证

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.

作者信息

Zhuang Yijing, Fang Dong, Li Pengfeng, Bai Bingyu, Hei Xiangqing, Feng Lujia, Li Wangting, Zhang Shaochong

机构信息

Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, Guangdong, China.

出版信息

Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.

DOI:10.3389/fcell.2025.1642539

PMID:40772224

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12325206/

Abstract

Although large language models (LLMs) show significant potential in clinical practice, accurate diagnosis and treatment planning in ophthalmology require multimodal integration of imaging, clinical history, and guideline-based knowledge. Current LLMs predominantly focus on unimodal language tasks and face limitations in specialized ophthalmic diagnosis due to domain knowledge gaps, hallucination risks, and inadequate alignment with clinical workflows. This study introduces a structured reasoning agent (ReasonAgent) that integrates a multimodal visual analysis module, a knowledge retrieval module, and a diagnostic reasoning module to address the limitations of current AI systems in ophthalmic decision-making. Validated on 30 real-world ophthalmic cases (27 common and 3 rare diseases), ReasonAgent demonstrated diagnostic accuracy comparable to ophthalmology residents ( = -0.07, = 0.65). However, in treatment planning, it significantly outperformed both GPT-4o ( = 0.49, = 0.01) and residents ( = 1.71, < 0.001), particularly excelling in rare disease scenarios (all < 0.05). While GPT-4o showed vulnerabilities in rare cases (90.48% low diagnostic scores), ReasonAgent's hybrid design mitigated errors through structured reasoning. Statistical analysis identified significant case-level heterogeneity (diagnosis ICC = 0.28), highlighting the need for domain-specific AI solutions in complex clinical contexts. This framework establishes a novel paradigm for domain-specific AI in real-world clinical practice, demonstrating the potential of modularized architectures to advance decision fidelity through human-aligned reasoning pathways.

摘要

尽管大语言模型（LLMs）在临床实践中显示出巨大潜力，但眼科的准确诊断和治疗规划需要影像、临床病史和基于指南的知识的多模态整合。当前的大语言模型主要专注于单模态语言任务，由于领域知识差距、幻觉风险以及与临床工作流程的不匹配，在专业眼科诊断中面临局限性。本研究引入了一种结构化推理代理（ReasonAgent），它集成了多模态视觉分析模块、知识检索模块和诊断推理模块，以解决当前人工智能系统在眼科决策中的局限性。在30个真实世界的眼科病例（27种常见疾病和3种罕见疾病）上进行验证，ReasonAgent表现出与眼科住院医师相当的诊断准确性（=-0.07，=0.65）。然而，在治疗规划方面，它显著优于GPT-4o（=0.49，=0.01）和住院医师（=1.71，<0.001），在罕见疾病场景中表现尤为出色（所有<0.05）。虽然GPT-4o在罕见病例中表现出弱点（90.48%的诊断分数较低），但ReasonAgent的混合设计通过结构化推理减少了错误。统计分析确定了显著的病例级异质性（诊断ICC=0.28），突出了在复杂临床环境中需要特定领域的人工智能解决方案。该框架为真实世界临床实践中的特定领域人工智能建立了一种新范式，展示了模块化架构通过与人类一致的推理路径提高决策保真度的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98e6/12325206/73819c3c9bbc/fcell-13-1642539-g001.jpg

相似文献

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.用于增强眼科决策的多模态推理智能体：一项初步的真实世界临床验证

Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能：比较研究

J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study.将医学知识图谱融入大语言模型进行诊断预测：设计与应用研究

JMIR AI. 2025 Feb 24;4:e58670. doi: 10.2196/58670.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测：基于放射学报告的多中心方法学研究

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

Knowledge Graph-Enhanced Deep Learning Model (H-SYSTEM) for Hypertensive Intracerebral Hemorrhage: Model Development and Validation.用于高血压性脑出血的知识图谱增强深度学习模型（H-SYSTEM）：模型开发与验证

J Med Internet Res. 2025 Jun 12;27:e66055. doi: 10.2196/66055.

Ophthalmological Question Answering and Reasoning Using OpenAI o1 vs Other Large Language Models.使用OpenAI的o1与其他大语言模型进行眼科问答和推理

JAMA Ophthalmol. 2025 Jul 31. doi: 10.1001/jamaophthalmol.2025.2413.

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study.生成式人工智能系统解决临床药学问题的比较分析：混合方法研究

JMIR Med Inform. 2025 Jul 24;13:e76128. doi: 10.2196/76128.

Evaluating GPT-4o in infectious disease diagnostics and management: A comparative study with residents and specialists on accuracy, completeness, and clinical support potential.评估GPT-4o在传染病诊断和管理中的应用：与住院医师和专科医生就准确性、完整性和临床支持潜力进行的比较研究。

Digit Health. 2025 Jul 7;11:20552076251355797. doi: 10.1177/20552076251355797. eCollection 2025 Jan-Dec.

Evaluating the Reasoning Capabilities of Large Language Models for Medical Coding and Hospital Readmission Risk Stratification: Zero-Shot Prompting Approach.评估大型语言模型在医学编码和医院再入院风险分层方面的推理能力：零样本提示方法。

J Med Internet Res. 2025 Jul 30;27:e74142. doi: 10.2196/74142.

Development and evaluation of a retrieval-augmented large language model framework for enhancing endodontic education.用于加强牙髓病学教育的检索增强大语言模型框架的开发与评估

Int J Med Inform. 2025 Nov;203:106006. doi: 10.1016/j.ijmedinf.2025.106006. Epub 2025 Jun 3.

本文引用的文献

Medical reasoning in LLMs: an in-depth analysis of DeepSeek R1.大语言模型中的医学推理：对DeepSeek R1的深入分析

Front Artif Intell. 2025 Jun 18;8:1616145. doi: 10.3389/frai.2025.1616145. eCollection 2025.

Assessment of synthetic post-therapeutic OCT images using the generative adversarial network in patients with macular edema secondary to retinal vein occlusion.使用生成对抗网络评估视网膜静脉阻塞继发黄斑水肿患者的合成治疗后光学相干断层扫描（OCT）图像

Front Cell Dev Biol. 2025 Jun 4;13:1609567. doi: 10.3389/fcell.2025.1609567. eCollection 2025.

Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images.基于大语言模型的多模态系统，用于从智能手机图像中检测和分级眼表疾病。

Front Cell Dev Biol. 2025 May 23;13:1600202. doi: 10.3389/fcell.2025.1600202. eCollection 2025.

Early prediction of colorectal adenoma risk: leveraging large-language model for clinical electronic medical record data.结直肠腺瘤风险的早期预测：利用大语言模型处理临床电子病历数据

Front Oncol. 2025 May 15;15:1508455. doi: 10.3389/fonc.2025.1508455. eCollection 2025.

Enhancing medical AI with retrieval-augmented generation: A mini narrative review.利用检索增强生成技术提升医学人工智能：一项小型叙述性综述。

Digit Health. 2025 Apr 21;11:20552076251337177. doi: 10.1177/20552076251337177. eCollection 2025 Jan-Dec.

Benchmark evaluation of DeepSeek large language models in clinical decision-making.临床决策中DeepSeek大语言模型的基准评估。

Nat Med. 2025 Apr 23. doi: 10.1038/s41591-025-03727-2.

Artificial intelligence-generated responses to frequently asked questions on coccydynia: Evaluating the accuracy and consistency of GPT-4o's performance.人工智能对尾骨痛常见问题的回答：评估GPT-4o表现的准确性和一致性。

Arch Rheumatol. 2025 Mar 17;40(1):63-71. doi: 10.46497/ArchRheumatol.2025.10966. eCollection 2025 Mar.

Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease.评估ChatGPT在甲状腺眼病患者咨询及基于图像的初步诊断中的表现。

Front Med (Lausanne). 2025 Feb 18;12:1546706. doi: 10.3389/fmed.2025.1546706. eCollection 2025.

Benefits, limits, and risks of ChatGPT in medicine.ChatGPT在医学领域的益处、局限性及风险

Front Artif Intell. 2025 Jan 30;8:1518049. doi: 10.3389/frai.2025.1518049. eCollection 2025.

Development and research status of intelligent ophthalmology in China.中国智能眼科的发展与研究现状

Int J Ophthalmol. 2024 Dec 18;17(12):2308-2315. doi: 10.18240/ijo.2024.12.20. eCollection 2024.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于增强眼科决策的多模态推理智能体：一项初步的真实世界临床验证

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献