• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
The potential and pitfalls of using a large language model such as ChatGPT, GPT-4, or LLaMA as a clinical assistant.使用大型语言模型(如 ChatGPT、GPT-4 或 Llama)作为临床助手的潜力和陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1884-1891. doi: 10.1093/jamia/ocae184.
2
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病:关于肺栓塞的实证研究
J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.
7
Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注:系统评价。
J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.
8
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
9
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
10
Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.在卫生经济学与结果研究中使用生成式人工智能:技术与突破入门
Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

引用本文的文献

1
Comparative Analysis of Large Language Models in Dermatological Diagnosis: An Evaluation of Diagnostic Accuracy.大语言模型在皮肤病诊断中的比较分析:诊断准确性评估
Cureus. 2025 Sep 11;17(9):e92089. doi: 10.7759/cureus.92089. eCollection 2025 Sep.
2
Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.将生成式大语言模型应用于电子健康记录的性能及改进策略:一项系统综述
Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.
3
Multilingual capabilities of GPT: A study of structural ambiguity.GPT的多语言能力:结构歧义研究
PLoS One. 2025 Jul 7;20(7):e0326943. doi: 10.1371/journal.pone.0326943. eCollection 2025.
4
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
5
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
6
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
7
Perspectives and Experiences With Large Language Models in Health Care: Survey Study.医疗保健领域中大型语言模型的观点与经验:调查研究
J Med Internet Res. 2025 May 1;27:e67383. doi: 10.2196/67383.
8
Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis.比较临床专业人员和大语言模型的诊断准确性:系统评价与荟萃分析
JMIR Med Inform. 2025 Apr 25;13:e64963. doi: 10.2196/64963.
9
Multimodal Metaverse Healthcare: A Collaborative Representation and Adaptive Fusion Approach for Generative Artificial-Intelligence-Driven Diagnosis.多模态元宇宙医疗保健:一种用于生成式人工智能驱动诊断的协作表示与自适应融合方法。
Research (Wash D C). 2025 Mar 12;8:0616. doi: 10.34133/research.0616. eCollection 2025.
10
ChatGPT4o's theranostic performance in the management of thoracolumbar spine fractures.ChatGPT4o在胸腰椎脊柱骨折管理中的诊疗性能。
Front Surg. 2025 Feb 25;12:1524396. doi: 10.3389/fsurg.2025.1524396. eCollection 2025.

本文引用的文献

1
ChatGPT in healthcare: A taxonomy and systematic review.ChatGPT 在医疗保健中的应用:分类法与系统综述。
Comput Methods Programs Biomed. 2024 Mar;245:108013. doi: 10.1016/j.cmpb.2024.108013. Epub 2024 Jan 15.
2
ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports.ChatGPT 让医学文献通俗易懂:简化放射学报告的探索性案例研究。
Eur Radiol. 2024 May;34(5):2817-2825. doi: 10.1007/s00330-023-10213-1. Epub 2023 Oct 5.
3
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
4
COPD generates substantial cost for health systems.慢性阻塞性肺疾病给卫生系统带来了巨大成本。
Lancet Glob Health. 2023 Aug;11(8):e1138-e1139. doi: 10.1016/S2214-109X(23)00304-2.
5
ChatGPT and antimicrobial advice: the end of the consulting infection doctor?ChatGPT与抗菌药物建议:感染科会诊医生的终结?
Lancet Infect Dis. 2023 Apr;23(4):405-406. doi: 10.1016/S1473-3099(23)00113-5. Epub 2023 Feb 20.
6
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
7
The future of medical education and research: Is ChatGPT a blessing or blight in disguise?医学教育与研究的未来:ChatGPT 是伪装的福祉还是祸根?
Med Educ Online. 2023 Dec;28(1):2181052. doi: 10.1080/10872981.2023.2181052.
8
Exploring ChatGPT for information of cardiopulmonary resuscitation.探索ChatGPT以获取心肺复苏相关信息。
Resuscitation. 2023 Apr;185:109729. doi: 10.1016/j.resuscitation.2023.109729. Epub 2023 Feb 10.
9
ChatGPT: the future of discharge summaries?ChatGPT:出院小结的未来?
Lancet Digit Health. 2023 Mar;5(3):e107-e108. doi: 10.1016/S2589-7500(23)00021-3. Epub 2023 Feb 6.
10
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

使用大型语言模型(如 ChatGPT、GPT-4 或 Llama)作为临床助手的潜力和陷阱。

The potential and pitfalls of using a large language model such as ChatGPT, GPT-4, or LLaMA as a clinical assistant.

机构信息

Pangaea Data Limited, London, SE1 7LY, United Kingdom.

Data Science Institute, Imperial College London, London, SW7 2AZ, United Kingdom.

出版信息

J Am Med Inform Assoc. 2024 Sep 1;31(9):1884-1891. doi: 10.1093/jamia/ocae184.

DOI:10.1093/jamia/ocae184
PMID:39018498
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339517/
Abstract

OBJECTIVES

This study aims to evaluate the utility of large language models (LLMs) in healthcare, focusing on their applications in enhancing patient care through improved diagnostic, decision-making processes, and as ancillary tools for healthcare professionals.

MATERIALS AND METHODS

We evaluated ChatGPT, GPT-4, and LLaMA in identifying patients with specific diseases using gold-labeled Electronic Health Records (EHRs) from the MIMIC-III database, covering three prevalent diseases-Chronic Obstructive Pulmonary Disease (COPD), Chronic Kidney Disease (CKD)-along with the rare condition, Primary Biliary Cirrhosis (PBC), and the hard-to-diagnose condition Cancer Cachexia.

RESULTS

In patient identification, GPT-4 had near similar or better performance compared to the corresponding disease-specific Machine Learning models (F1-score ≥ 85%) on COPD, CKD, and PBC. GPT-4 excelled in the PBC use case, achieving a 4.23% higher F1-score compared to disease-specific "Traditional Machine Learning" models. ChatGPT and LLaMA3 demonstrated lower performance than GPT-4 across all diseases and almost all metrics. Few-shot prompts also help ChatGPT, GPT-4, and LLaMA3 achieve higher precision and specificity but lower sensitivity and Negative Predictive Value.

DISCUSSION

The study highlights the potential and limitations of LLMs in healthcare. Issues with errors, explanatory limitations and ethical concerns like data privacy and model transparency suggest that these models would be supplementary tools in clinical settings. Future studies should improve training datasets and model designs for LLMs to gain better utility in healthcare.

CONCLUSION

The study shows that LLMs have the potential to assist clinicians for tasks such as patient identification but false positives and false negatives must be mitigated before LLMs are adequate for real-world clinical assistance.

摘要

目的

本研究旨在评估大型语言模型(LLM)在医疗保健中的应用,重点关注它们如何通过改进诊断、决策过程以及作为医疗保健专业人员的辅助工具来提高患者护理水平。

材料和方法

我们评估了 ChatGPT、GPT-4 和 LLaMA 在使用 MIMIC-III 数据库中的金标准电子健康记录(EHR)识别特定疾病患者方面的应用,涵盖了三种常见疾病——慢性阻塞性肺疾病(COPD)、慢性肾脏病(CKD)——以及罕见疾病原发性胆汁性肝硬化(PBC)和难以诊断的癌症恶病质。

结果

在患者识别方面,GPT-4 在 COPD、CKD 和 PBC 方面的表现与相应的疾病特异性机器学习模型(F1 分数≥85%)相似或更好。GPT-4 在 PBC 应用中表现出色,与疾病特异性“传统机器学习”模型相比,F1 分数高出 4.23%。ChatGPT 和 LLaMA3 在所有疾病和几乎所有指标上的性能都低于 GPT-4。通过Few-shot 提示,ChatGPT、GPT-4 和 LLaMA3 也可以提高精度和特异性,但降低敏感性和负预测值。

讨论

该研究强调了大型语言模型在医疗保健中的潜力和局限性。错误、解释性限制以及数据隐私和模型透明度等伦理问题表明,这些模型将成为临床环境中的补充工具。未来的研究应改进大型语言模型的训练数据集和模型设计,以提高其在医疗保健中的应用效果。

结论

该研究表明,大型语言模型有可能协助临床医生进行患者识别等任务,但在大型语言模型足以提供实际临床帮助之前,必须减轻假阳性和假阴性的问题。