Zhuang Yijing, Fang Dong, Li Pengfeng, Bai Bingyu, Hei Xiangqing, Feng Lujia, Li Wangting, Zhang Shaochong
Shenzhen Eye Hospital, Shenzhen Eye Institute, Jinan University, Shenzhen, Guangdong, China.
Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.
Although large language models (LLMs) show significant potential in clinical practice, accurate diagnosis and treatment planning in ophthalmology require multimodal integration of imaging, clinical history, and guideline-based knowledge. Current LLMs predominantly focus on unimodal language tasks and face limitations in specialized ophthalmic diagnosis due to domain knowledge gaps, hallucination risks, and inadequate alignment with clinical workflows. This study introduces a structured reasoning agent (ReasonAgent) that integrates a multimodal visual analysis module, a knowledge retrieval module, and a diagnostic reasoning module to address the limitations of current AI systems in ophthalmic decision-making. Validated on 30 real-world ophthalmic cases (27 common and 3 rare diseases), ReasonAgent demonstrated diagnostic accuracy comparable to ophthalmology residents ( = -0.07, = 0.65). However, in treatment planning, it significantly outperformed both GPT-4o ( = 0.49, = 0.01) and residents ( = 1.71, < 0.001), particularly excelling in rare disease scenarios (all < 0.05). While GPT-4o showed vulnerabilities in rare cases (90.48% low diagnostic scores), ReasonAgent's hybrid design mitigated errors through structured reasoning. Statistical analysis identified significant case-level heterogeneity (diagnosis ICC = 0.28), highlighting the need for domain-specific AI solutions in complex clinical contexts. This framework establishes a novel paradigm for domain-specific AI in real-world clinical practice, demonstrating the potential of modularized architectures to advance decision fidelity through human-aligned reasoning pathways.
尽管大语言模型(LLMs)在临床实践中显示出巨大潜力,但眼科的准确诊断和治疗规划需要影像、临床病史和基于指南的知识的多模态整合。当前的大语言模型主要专注于单模态语言任务,由于领域知识差距、幻觉风险以及与临床工作流程的不匹配,在专业眼科诊断中面临局限性。本研究引入了一种结构化推理代理(ReasonAgent),它集成了多模态视觉分析模块、知识检索模块和诊断推理模块,以解决当前人工智能系统在眼科决策中的局限性。在30个真实世界的眼科病例(27种常见疾病和3种罕见疾病)上进行验证,ReasonAgent表现出与眼科住院医师相当的诊断准确性(=-0.07,=0.65)。然而,在治疗规划方面,它显著优于GPT-4o(=0.49,=0.01)和住院医师(=1.71,<0.001),在罕见疾病场景中表现尤为出色(所有<0.05)。虽然GPT-4o在罕见病例中表现出弱点(90.48%的诊断分数较低),但ReasonAgent的混合设计通过结构化推理减少了错误。统计分析确定了显著的病例级异质性(诊断ICC=0.28),突出了在复杂临床环境中需要特定领域的人工智能解决方案。该框架为真实世界临床实践中的特定领域人工智能建立了一种新范式,展示了模块化架构通过与人类一致的推理路径提高决策保真度的潜力。