文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估大型语言模型对中医临床实践指南的遵循情况:一项内容分析

Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis.

作者信息

Zhao Weilong, Lai Honghao, Pan Bei, Huang Jiajie, Xia Danni, Bai Chunyang, Liu Jiayi, Liu Jianing, Jin Yinghui, Shang Hongcai, Liu Jianping, Shi Nannan, Liu Jie, Chen Yaolong, Estill Janne, Ge Long

机构信息

Department of Health Policy and Management, School of Public Health, Lanzhou University, Lanzhou, China.

Evidence-Based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou, China.

出版信息

Front Pharmacol. 2025 Jul 25;16:1649041. doi: 10.3389/fphar.2025.1649041. eCollection 2025.


DOI:10.3389/fphar.2025.1649041
PMID:40786055
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12331602/
Abstract

OBJECTIVE: Whether large language models (LLMs) can effectively facilitate CM knowledge acquisition remains uncertain. This study aims to assess the adherence of LLMs to Clinical Practice Guidelines (CPGs) in CM. METHODS: This cross-sectional study randomly selected ten CPGs in CM and constructed 150 questions across three categories: medication based on differential diagnosis (MDD), specific prescription consultation (SPC), and CM theory analysis (CTA). Eight LLMs (GPT-4o, Claude-3.5 Sonnet, Moonshot-v1, ChatGLM-4, DeepSeek-v3, DeepSeek-r1, Claude-4 sonnet, and Claude-4 sonnet thinking) were evaluated using both English and Chinese queries. The main evaluation metrics included accuracy, readability, and use of safety disclaimers. RESULTS: Overall, DeepSeek-v3 and DeepSeek-r1 demonstrated superior performance in both English (median 5.00, interquartile range (IQR) 4.00-5.00 vs. median 5.00, IQR 3.70-5.00) and Chinese (both median 5.00, IQR 4.30-5.00), significantly outperforming all other models. All models achieved significantly higher accuracy in Chinese versus English responses (all p < 0.05). Significant variations in accuracy were observed across the categories of questions, with MDD and SPC questions presenting more challenges than CTA questions. English responses had lower readability (mean flesch reading ease score 32.7) compared to Chinese responses. Moonshot-v1 provided the highest rate of safety disclaimers (98.7% English, 100% Chinese). CONCLUSION: LLMs showed varying degrees of potential for acquiring CM knowledge. The performance of DeepSeek-v3 and DeepSeek-r1 is satisfactory. Optimizing LLMs to become effective tools for disseminating CM information is an important direction for future development.

摘要

目的:大语言模型(LLMs)能否有效促进中医知识获取仍不确定。本研究旨在评估大语言模型在中医临床实践指南(CPGs)方面的遵循情况。 方法:这项横断面研究随机选取了十条中医临床实践指南,并构建了150个问题,分为三类:基于鉴别诊断的用药(MDD)、特定处方咨询(SPC)和中医理论分析(CTA)。使用英语和中文查询对八个大语言模型(GPT-4o、Claude-3.5 Sonnet、Moonshot-v1、ChatGLM-4、DeepSeek-v3、DeepSeek-r1、Claude-4 sonnet和Claude-4 sonnet thinking)进行评估。主要评估指标包括准确性、可读性和安全免责声明的使用情况。 结果:总体而言,DeepSeek-v3和DeepSeek-r1在英语(中位数5.00,四分位间距(IQR)4.00 - 5.00,而其他模型中位数5.00,IQR 3.70 - 5.00)和中文(两者中位数均为5.00,IQR 4.30 - 5.00)方面均表现出卓越性能,显著优于所有其他模型。与英语回答相比,所有模型在中文回答中均取得了显著更高的准确性(所有p < 0.05)。不同类别的问题在准确性上存在显著差异,MDD和SPC问题比CTA问题更具挑战性。与中文回答相比,英语回答的可读性较低(平均弗莱什易读性得分32.7)。Moonshot-v1提供安全免责声明的比例最高(英语为98.7%,中文为100%)。 结论:大语言模型在获取中医知识方面展现出不同程度的潜力。DeepSeek-v3和DeepSeek-r1的性能令人满意。优化大语言模型以成为传播中医信息的有效工具是未来发展的一个重要方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/743970d41ee9/fphar-16-1649041-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/530862a1af94/fphar-16-1649041-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/8d3d6b0b2aba/fphar-16-1649041-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/be6d4e3169a0/fphar-16-1649041-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/743970d41ee9/fphar-16-1649041-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/530862a1af94/fphar-16-1649041-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/8d3d6b0b2aba/fphar-16-1649041-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/be6d4e3169a0/fphar-16-1649041-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af6a/12331602/743970d41ee9/fphar-16-1649041-g004.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索