Yang Xintian, Li Tongxin, Wang Han, Zhang Rongchun, Ni Zhi, Liu Na, Zhai Huihong, Zhao Jianghai, Meng Fandong, Zhou Zhongyin, Tang Shanhong, Wang Limei, Wang Xiangping, Luo Hui, Ren Gui, Zhang Linhui, Kang Xiaoyu, Wang Jun, Bo Ning, Yang Xiaoning, Xue Weijie, Zhang Xiaoyin, Chen Ning, Guo Rui, Li Baiwen, Li Yajun, Liu Yaling, Zhang Tiantian, Liang Shuhui, Lv Yong, Nie Yongzhan, Fan Daiming, Zhao Lina, Pan Yanglin
State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers and National Clinical Research Center for Digestive Diseases, Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi'an, China.
Department of Pathology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
NPJ Digit Med. 2025 Feb 5;8(1):85. doi: 10.1038/s41746-025-01486-5.
Faced with challenging cases, doctors are increasingly seeking diagnostic advice from large language models (LLMs). This study aims to compare the ability of LLMs and human physicians to diagnose challenging cases. An offline dataset of 67 challenging cases with primary gastrointestinal symptoms was used to solicit possible diagnoses from seven LLMs and 22 gastroenterologists. The diagnoses by Claude 3.5 Sonnet covered the highest proportion (95% confidence interval [CI]) of instructive diagnoses (76.1%, [70.6%-80.9%]), significantly surpassing all the gastroenterologists (p < 0.05 for all). Claude 3.5 Sonnet achieved a significantly higher coverage rate (95% CI) than that of the gastroenterologists using search engines or other traditional resource (76.1% [70.6%-80.9%] vs. 45.5% [40.7%-50.4%], p < 0.001). The study highlights that advanced LLMs may assist gastroenterologists with instructive, time-saving, and cost-effective diagnostic scopes in challenging cases.
面对具有挑战性的病例,医生越来越多地向大语言模型(LLMs)寻求诊断建议。本研究旨在比较大语言模型和人类医生诊断具有挑战性病例的能力。使用一个包含67例以胃肠道症状为主的具有挑战性病例的离线数据集,向7个大语言模型和22名胃肠病学家征求可能的诊断。Claude 3.5 Sonnet做出的诊断涵盖了指导性诊断的最高比例(95%置信区间[CI])(76.1%,[70.6%-80.9%]),显著超过所有胃肠病学家(所有比较p < 0.05)。Claude 3.5 Sonnet实现的覆盖率(95%CI)显著高于使用搜索引擎或其他传统资源的胃肠病学家(76.1%[70.6%-80.9%]对45.5%[40.7%-50.4%],p < 0.001)。该研究强调,先进的大语言模型可能在具有挑战性的病例中为胃肠病学家提供具有指导性、节省时间和成本效益的诊断范围。