• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型在提供肝硬化相关信息方面的实际表现:一项比较研究。

The actual performance of large language models in providing liver cirrhosis-related information: A comparative study.

作者信息

Li Yanqiu, Li Zhuojun, Li Jinze, Liu Long, Liu Yao, Zhu Bingbing, Shi Ke, Lu Yu, Li Yongqi, Zeng Xuanwei, Feng Ying, Wang Xianbo

机构信息

Center for Integrative Medicine, Beijing Ditan Hospital, Capital Medical University, Beijing, China.

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.

出版信息

Int J Med Inform. 2025 Sep;201:105961. doi: 10.1016/j.ijmedinf.2025.105961. Epub 2025 May 5.

DOI:10.1016/j.ijmedinf.2025.105961
PMID:40334344
Abstract

OBJECTIVE

With the increasing prevalence of large language models (LLMs) in the medical field, patients are increasingly turning to advanced online resources for information related to liver cirrhosis due to its long-term management processes. Therefore, a comprehensive evaluation of real-world performance of LLMs in these specialized medical areas is necessary.

METHODS

This study evaluates the performance of four mainstream LLMs (ChatGPT-4o, Claude-3.5 Sonnet, Gemini-1.5 Pro, and Llama-3.1) in answering 39 questions related to liver cirrhosis. The information quality, readability and accuracy were assessed using Ensuring Quality Information for Patients tool, Flesch-Kincaid metrics and consensus scoring. The simplification and their self-correction ability of LLMs were also assessed.

RESULTS

Significant performance differences were observed among the models. Gemini scored highest in providing high-quality information. While the readability of all four LLMs was generally low, requiring a college-level reading comprehension ability, they exhibited strong capabilities in simplifying complex information. ChatGPT performed best in terms of accuracy, with a "Good" rating of 80%, higher than Claude (72%), Gemini (49%), and Llama (64%). All models received high scores for comprehensiveness. Each of the four LLMs demonstrated some degree of self-correction ability, improving the accuracy of initial answers with simple prompts. ChatGPT's and Llama's accuracy improved by 100%, Claude's by 50% and Gemini's by 67%.

CONCLUSION

LLMs demonstrate excellent performance in generating health information related to liver cirrhosis, yet they exhibit differences in answer quality, readability and accuracy. Future research should enhance their value in healthcare, ultimately achieving reliable, accessible and patient-centered medical information dissemination.

摘要

目的

随着大语言模型(LLMs)在医学领域的应用日益广泛,由于肝硬化的长期管理过程,患者越来越多地转向先进的在线资源获取相关信息。因此,有必要对大语言模型在这些专业医学领域的实际性能进行全面评估。

方法

本研究评估了四种主流大语言模型(ChatGPT-4o、Claude-3.5 Sonnet、Gemini-1.5 Pro和Llama-3.1)回答39个与肝硬化相关问题的性能。使用“为患者确保信息质量”工具、弗莱什-金凯德指标和共识评分来评估信息质量、可读性和准确性。还评估了大语言模型的简化能力及其自我纠正能力。

结果

各模型之间观察到显著的性能差异。Gemini在提供高质量信息方面得分最高。虽然所有四个大语言模型的可读性都要求具备大学水平的阅读理解能力,但它们在简化复杂信息方面表现出很强的能力。ChatGPT在准确性方面表现最佳,“良好”评级为80%,高于Claude(72%)、Gemini(49%)和Llama(64%)。所有模型在全面性方面都获得了高分。四个大语言模型都表现出一定程度的自我纠正能力,通过简单提示提高了初始答案的准确性。ChatGPT和Llama的准确性提高了100%,Claude提高了50%,Gemini提高了67%。

结论

大语言模型在生成与肝硬化相关的健康信息方面表现出色,但在答案质量、可读性和准确性方面存在差异。未来的研究应提高它们在医疗保健中的价值,最终实现可靠、易获取且以患者为中心的医疗信息传播。

相似文献

1
The actual performance of large language models in providing liver cirrhosis-related information: A comparative study.大语言模型在提供肝硬化相关信息方面的实际表现:一项比较研究。
Int J Med Inform. 2025 Sep;201:105961. doi: 10.1016/j.ijmedinf.2025.105961. Epub 2025 May 5.
2
Large Language Models' Responses to Spinal Cord Injury: A Comparative Study of Performance.大语言模型对脊髓损伤的反应:性能比较研究
J Med Syst. 2025 Mar 25;49(1):39. doi: 10.1007/s10916-025-02170-7.
3
Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。
Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.
4
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.
5
Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.在回答患者问题方面,大型语言模型聊天机器人的表现是否优于成熟的患者信息资源?一项关于黑色素瘤的比较研究。
Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.
6
Comparative Performance of the Leading Large Language Models in Answering Complex Rhinoplasty Consultation Questions.领先的大语言模型在回答复杂鼻整形咨询问题方面的比较性能。
Facial Plast Surg Aesthet Med. 2025 Jan 15. doi: 10.1089/fpsam.2024.0206.
7
Enhancing responses from large language models with role-playing prompts: a comparative study on answering frequently asked questions about total knee arthroplasty.通过角色扮演提示增强大语言模型的回答:关于全膝关节置换术常见问题解答的比较研究
BMC Med Inform Decis Mak. 2025 May 23;25(1):196. doi: 10.1186/s12911-025-03024-5.
8
Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study.探索大语言模型在乙型肝炎感染相关问题上的表现:一项比较研究。
World J Gastroenterol. 2025 Jan 21;31(3):101092. doi: 10.3748/wjg.v31.i3.101092.
9
Appropriateness of Thyroid Nodule Cancer Risk Assessment and Management Recommendations Provided by Large Language Models.大语言模型提供的甲状腺结节癌症风险评估及管理建议的适宜性
J Imaging Inform Med. 2025 Mar 3. doi: 10.1007/s10278-025-01454-1.
10
Assessing large language models as assistive tools in medical consultations for Kawasaki disease.评估大型语言模型作为川崎病医疗咨询辅助工具的作用。
Front Artif Intell. 2025 Mar 31;8:1571503. doi: 10.3389/frai.2025.1571503. eCollection 2025.