• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

换一种方式来看:大语言模型能否提供与呼吸道误吸相关的知识?

To take a different approach: Can large language models provide knowledge related to respiratory aspiration?

作者信息

Niu Yirou, Fu Shuojin, Xuan Zehui, Kang Ruifu, Ren Zhifang, Jin Shuai, Wang Yanling, Xiao Qian

机构信息

School of Nursing, Capital Medical University, Beijing, China.

出版信息

Digit Health. 2025 Jul 10;11:20552076251349616. doi: 10.1177/20552076251349616. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251349616
PMID:40656849
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12254663/
Abstract

OBJECTIVE

To investigate the performance (accuracy, comprehensiveness, consistency, and the necessary information ratio) of large language models (LLMs) in providing knowledge related to respiratory aspiration, and to explore the potential of using LLMs as training tools.

METHODS

This study was a non-human-subject evaluative research. Two LLMs (GPT-3.5 and GPT-4) were asked 36 questions (32 objective questions and four subjective questions) about respiratory aspiration in English and Chinese. Responses were scored by two experts against gold standards derived from authoritative books. The accuracy of the two LLMs' responses of objective questions were compared by chi-square test or Fisher exact probability method. For subjective questions, the t-test or Mann-Whitney U test was used to compare the differences between two LLMs.

RESULTS

There was no significant difference in the ratings provided by the two experts. The accuracy scores of objective questions of two LLMs were high. LLMs also performed well on subjective questions, showing high levels of accuracy, comprehensiveness, consistency, and necessary information ratio. And no significant differences were found in the accuracy of the English and Chinese responses to subjective questions between the two LLMs (z = 0.331,  = 0.886; z = 1.703,  = 0.114). There was no significant difference in the comprehensiveness of the English and Chinese responses between the two LLMs (t = 0.787,  = 0.461; t = 1.175,  = 0.285).

CONCLUSIONS

LLMs demonstrated promising performance in delivering respiratory aspiration-related knowledge and showed promise as supportive tools in training, particularly when their limitations were well understood.

摘要

目的

研究大语言模型(LLMs)在提供与呼吸道误吸相关知识方面的性能(准确性、全面性、一致性和必要信息率),并探索将LLMs用作训练工具的潜力。

方法

本研究为非人体评估研究。向两个大语言模型(GPT-3.5和GPT-4)提出了36个关于呼吸道误吸的问题(32个客观问题和4个主观问题),问题采用英文和中文表述。两位专家根据权威书籍得出的金标准对回答进行评分。通过卡方检验或Fisher精确概率法比较两个大语言模型客观问题回答的准确性。对于主观问题,使用t检验或Mann-Whitney U检验比较两个大语言模型之间的差异。

结果

两位专家给出的评分无显著差异。两个大语言模型客观问题的准确率较高。大语言模型在主观问题上也表现良好,在准确性、全面性、一致性和必要信息率方面都处于较高水平。并且两个大语言模型在主观问题的英文和中文回答准确性方面均未发现显著差异(z = 0.331,P = 0.886;z = 1.703,P = 0.114)。两个大语言模型在英文和中文回答的全面性方面也没有显著差异(t = 0.787,P = 0.461;t = 1.175,P = 0.285)。

结论

大语言模型在提供与呼吸道误吸相关知识方面表现出了良好的性能,并有望成为训练中的辅助工具,尤其是在充分了解其局限性的情况下。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/e836e73db585/10.1177_20552076251349616-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/1b779e646abe/10.1177_20552076251349616-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/1e530a4bc9a5/10.1177_20552076251349616-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/a81885567231/10.1177_20552076251349616-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/e836e73db585/10.1177_20552076251349616-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/1b779e646abe/10.1177_20552076251349616-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/1e530a4bc9a5/10.1177_20552076251349616-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/a81885567231/10.1177_20552076251349616-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec6f/12254663/e836e73db585/10.1177_20552076251349616-fig4.jpg

相似文献

1
To take a different approach: Can large language models provide knowledge related to respiratory aspiration?换一种方式来看:大语言模型能否提供与呼吸道误吸相关的知识?
Digit Health. 2025 Jul 10;11:20552076251349616. doi: 10.1177/20552076251349616. eCollection 2025 Jan-Dec.
2
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
3
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
4
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
5
Performance of Large Language Models in the Non-English Context: Qualitative Study of Models Trained on Different Languages in Chinese Medical Examinations.大语言模型在非英语环境中的表现:对在中国医学考试中使用不同语言训练的模型的定性研究
JMIR Med Inform. 2025 Jun 27;13:e69485. doi: 10.2196/69485.
6
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.
7
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
8
Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management.大语言模型在回答牙周根分叉病变管理临床问题中的性能评估
Dent J (Basel). 2025 Jun 18;13(6):271. doi: 10.3390/dj13060271.
9
Large Language Model-Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Usability Study.使用修订后的偏倚风险工具在随机对照试验中进行大语言模型辅助的偏倚风险评估:可用性研究
J Med Internet Res. 2025 Jun 24;27:e70450. doi: 10.2196/70450.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

本文引用的文献

1
Comparative performance analysis of global and chinese-domain large language models for myopia.全球和中国领域用于近视研究的大语言模型的性能对比分析
Eye (Lond). 2025 Apr 13. doi: 10.1038/s41433-025-03775-5.
2
SAFE anaesthesia for patients at risk of pulmonary aspiration: A nationwide survey.针对有肺误吸风险患者的安全麻醉:一项全国性调查。
Eur J Anaesthesiol Intensive Care. 2025 Mar 10;4(2):e0070. doi: 10.1097/EA9.0000000000000070. eCollection 2025 Apr.
3
Generative artificial intelligence in oncology.肿瘤学中的生成式人工智能。
Curr Opin Urol. 2025 May 1;35(3):205-213. doi: 10.1097/MOU.0000000000001272. Epub 2025 Feb 28.
4
Assessing the Impact of New Technologies on Managing Chronic Respiratory Diseases.评估新技术对慢性呼吸道疾病管理的影响。
J Clin Med. 2024 Nov 16;13(22):6913. doi: 10.3390/jcm13226913.
5
RE: Exploring new educational approaches in neuropathic pain: assessing accuracy and consistency of AI responses from GPT-3.5 and GPT-4.主题:探索神经性疼痛的新教育方法:评估GPT-3.5和GPT-4的人工智能回复的准确性和一致性
Pain Med. 2025 Apr 1;26(4):227-228. doi: 10.1093/pm/pnae116.
6
Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT.利用 AI 提高医学问题生成效率和效果的 12 个技巧:Chat GPT 教学应用指南
Med Teach. 2024 Aug;46(8):1021-1026. doi: 10.1080/0142159X.2023.2294703. Epub 2023 Dec 26.
7
Utility of a LangChain and OpenAI GPT-powered chatbot based on the international consensus statement on allergy and rhinology: Rhinosinusitis.基于过敏和鼻科学国际共识声明的 LangChain 和 OpenAI GPT 驱动的聊天机器人在变应性鼻炎中的应用。
Int Forum Allergy Rhinol. 2024 Jun;14(6):1101-1109. doi: 10.1002/alr.23310. Epub 2023 Dec 18.
8
Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4).使用生成式预训练转换器 4(GPT-4)从非结构化组织病理学报告中提取结构化信息。
J Pathol. 2024 Mar;262(3):310-319. doi: 10.1002/path.6232. Epub 2023 Dec 14.
9
Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage.基于深度学习的自然语言处理在急诊分诊中检测医疗症状和病史。
Am J Emerg Med. 2024 Mar;77:29-38. doi: 10.1016/j.ajem.2023.11.063. Epub 2023 Dec 10.
10
The emergence of economic rationality of GPT.GPT 的经济理性的出现。
Proc Natl Acad Sci U S A. 2023 Dec 19;120(51):e2316205120. doi: 10.1073/pnas.2316205120. Epub 2023 Dec 12.