Wan Xiao-Han, Liu Mei-Xia, Zhang Yan, Kou Guan-Jun, Xu Lei-Qi, Liu Han, Yang Xiao-Yun, Zuo Xiu-Li, Li Yan-Qing
Department of Gastroenterology, Qilu Hospital of Shandong University, Jinan 250012, Shandong Province, China.
World J Gastroenterol. 2025 Aug 21;31(31):109948. doi: 10.3748/wjg.v31.i31.109948.
Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.
To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.
In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal information extraction, logical "chain" construction, diagnostic and treatment suggestion generation, and risk analysis. The model was evaluated using objective metrics, which assess the reliability and comprehensiveness of model-generated results, and subjective expert opinions, which examine the effectiveness of the framework in assisting physicians.
The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.
The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.
胃肠道疾病病因复杂,临床表现多样。准确诊断需要医生整合多种信息,包括病史、实验室检查结果和影像学检查结果。现有的人工智能辅助诊断工具仅限于单模态信息,导致给出的建议往往不完整,可能存在临床或法律风险。
开发并评估一种用于消化系统疾病临床决策的协作式多模态大语言模型(LLM)框架。
在这项观察性研究中,开发了一种多模态LLM协作诊断框架DeepGut,将四个不同的大模型整合为四层结构。该框架依次完成多模态信息提取、逻辑“链”构建、诊断和治疗建议生成以及风险分析。使用客观指标评估该模型,这些指标评估模型生成结果的可靠性和全面性,以及主观专家意见,后者考察该框架在协助医生方面的有效性。
DeepGut框架生成的诊断和治疗建议表现出色,诊断准确率为97.8%,诊断完整性为93.9%,治疗方案准确率为95.2%,治疗方案完整性为98.0%,显著超过基于单模态LLM的诊断工具的能力。评估该框架的专家称赞其输出的完整性、相关性和逻辑连贯性。然而,协作式多模态LLM方法导致输入和输出令牌数增加,从而导致计算成本更高和诊断时间延长。
该框架成功整合了多模态诊断数据,证明了多模态LLM协作带来的性能提升,为人工智能辅助技术的临床应用开辟了新前景。