• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探讨创新型人工智能聊天机器人对后疫情时代医学教育和临床辅助的影响:全面分析。

Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis.

机构信息

Department of Surgery, Peninsula Health, Melbourne, Victoria, Australia.

Faculty of Science, Medicine, and Health, Monash University, Melbourne, Victoria, Australia.

出版信息

ANZ J Surg. 2024 Feb;94(1-2):68-77. doi: 10.1111/ans.18666. Epub 2023 Aug 21.

DOI:10.1111/ans.18666
PMID:37602755
Abstract

BACKGROUND

The COVID-19 pandemic has significantly disrupted clinical experience and exposure of medical students and junior doctors. Artificial Intelligence (AI) integration in medical education has the potential to enhance learning and improve patient care. This study aimed to evaluate the effectiveness of three popular large language models (LLMs) in serving as clinical decision-making support tools for junior doctors.

METHODS

A series of increasingly complex clinical scenarios were presented to ChatGPT, Google's Bard and Bing's AI. Their responses were evaluated against standard guidelines, and for reliability by the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, the Coleman-Liau Index, and the modified DISCERN score for assessing suitability. Lastly, the LLMs outputs were assessed by using the Likert scale for accuracy, informativeness, and accessibility by three experienced specialists.

RESULTS

In terms of readability and reliability, ChatGPT stood out among the three LLMs, recording the highest scores in Flesch Reading Ease (31.2 ± 3.5), Flesch-Kincaid Grade Level (13.5 ± 0.7), Coleman-Lau Index (13) and DISCERN (62 ± 4.4). These results suggest statistically significant superior comprehensibility and alignment with clinical guidelines in the medical advice given by ChatGPT. Bard followed closely behind, with BingAI trailing in all categories. The only non-significant statistical differences (P > 0.05) were found between ChatGPT and Bard's readability indices, and between the Flesch Reading Ease scores of ChatGPT/Bard and BingAI.

CONCLUSION

This study demonstrates the potential utility of LLMs in fostering self-directed and personalized learning, as well as bolstering clinical decision-making support for junior doctors. However further development is needed for its integration into education.

摘要

背景

COVID-19 大流行极大地扰乱了医学生和初级医生的临床经验和接触。人工智能(AI)在医学教育中的整合具有增强学习和改善患者护理的潜力。本研究旨在评估三种流行的大型语言模型(LLM)作为初级医生临床决策支持工具的有效性。

方法

向 ChatGPT、谷歌的 Bard 和必应的 AI 呈现一系列越来越复杂的临床场景。根据标准指南评估他们的反应,并使用 Flesch 阅读易读性评分、Flesch-Kincaid 年级水平、Coleman-Liau 指数和修改后的 DISCERN 评分评估可靠性,以评估适合度。最后,三位经验丰富的专家使用李克特量表评估 LLM 的输出的准确性、信息量和可及性。

结果

在可读性和可靠性方面,ChatGPT 在三种 LLM 中脱颖而出,在 Flesch 阅读易读性(31.2±3.5)、Flesch-Kincaid 年级水平(13.5±0.7)、Coleman-Lau 指数(13)和 DISCERN(62±4.4)方面得分最高。这些结果表明,在提供的医学建议中,ChatGPT 的理解度和与临床指南的一致性具有统计学上的显著优势。Bard 紧随其后,BingAI 在所有类别中都落后。唯一没有统计学差异的是(P>0.05)在 ChatGPT 和 Bard 的可读性指数之间,以及在 ChatGPT/Bard 的 Flesch 阅读易读性得分和 BingAI 之间。

结论

本研究表明 LLM 具有促进自我指导和个性化学习以及增强初级医生临床决策支持的潜力。然而,需要进一步开发才能将其整合到教育中。

相似文献

1
Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: a comprehensive analysis.探讨创新型人工智能聊天机器人对后疫情时代医学教育和临床辅助的影响:全面分析。
ANZ J Surg. 2024 Feb;94(1-2):68-77. doi: 10.1111/ans.18666. Epub 2023 Aug 21.
2
Comparison of large language models in management advice for melanoma: Google's AI BARD, BingAI and ChatGPT.大语言模型在黑色素瘤管理建议方面的比较:谷歌的人工智能BARD、必应人工智能和ChatGPT。
Skin Health Dis. 2023 Nov 28;4(1):e313. doi: 10.1002/ski2.313. eCollection 2024 Feb.
3
Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study.比较大型语言模型ChatGPT、BARD和必应人工智能在提供隆鼻信息方面的功效:一项观察性研究。
Aesthet Surg J Open Forum. 2023 Sep 14;5:ojad084. doi: 10.1093/asjof/ojad084. eCollection 2023.
4
Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.人工智能能回答我的问题吗?腹部整形手术患者围手术期评估中人工智能的应用。
Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.
5
Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。
Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.
6
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
7
Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures.通过聊天机器人生成的材料优化眼科患者教育:人工智能生成的患者教育材料和美国眼科整形重建外科学会患者手册的可读性分析。
Ophthalmic Plast Reconstr Surg. 2024;40(2):212-216. doi: 10.1097/IOP.0000000000002549. Epub 2023 Nov 16.
8
Evaluating Artificial Intelligence's Role in Teaching the Reporting and Interpretation of Computed Tomographic Angiography for Preoperative Planning of the Deep Inferior Epigastric Artery Perforator Flap.评估人工智能在教学腹下深动脉穿支皮瓣术前规划的计算机断层血管造影报告及解读中的作用。
JPRAS Open. 2024 Apr 5;40:273-285. doi: 10.1016/j.jpra.2024.03.010. eCollection 2024 Jun.
9
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.
10
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士:评估人工智能生成的关于阑尾炎的医学信息的内容和质量。
Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.

引用本文的文献

1
Assessing the Accuracy and Completeness of AI-Generated Dental Responses: An Evaluation of the Chat-GPT Model.评估人工智能生成的牙科回复的准确性和完整性:Chat-GPT模型的评估
Healthcare (Basel). 2025 Aug 28;13(17):2144. doi: 10.3390/healthcare13172144.
2
Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.ChatGPT与Bard在医学交流中的可读性比较:一项荟萃分析。
BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.
3
Large Language Model Architectures in Health Care: Scoping Review of Research Perspectives.
医疗保健中的大语言模型架构:研究视角的范围综述
J Med Internet Res. 2025 Jun 19;27:e70315. doi: 10.2196/70315.
4
[Is the application of digital technologies the game changer for surgical training of the future? A Germany-wide analysis].[数字技术的应用会成为未来外科培训的变革者吗?一项全德范围的分析]
Chirurgie (Heidelb). 2025 May 22. doi: 10.1007/s00104-025-02306-y.
5
Is AI the future of evaluation in medical education?? AI vs. human evaluation in objective structured clinical examination.人工智能会是医学教育评估的未来吗??客观结构化临床考试中的人工智能与人工评估
BMC Med Educ. 2025 May 1;25(1):641. doi: 10.1186/s12909-025-07241-4.
6
Mapping the use of artificial intelligence in medical education: a scoping review.医学教育中人工智能应用的映射研究:一项范围综述
BMC Med Educ. 2025 Apr 12;25(1):526. doi: 10.1186/s12909-025-07089-8.
7
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.医学教育中的ChatGPT及其他大语言模型——文献综述
Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.
8
Artificial intelligence solutions for temporomandibular joint disorders: Contributions and future potential of ChatGPT.颞下颌关节紊乱病的人工智能解决方案:ChatGPT的贡献与未来潜力
Korean J Orthod. 2025 Mar 25;55(2):131-141. doi: 10.4041/kjod24.106. Epub 2024 Dec 11.
9
Use of generative large language models for patient education on common surgical conditions: a comparative analysis between ChatGPT and Google Gemini.使用生成式大语言模型对常见外科疾病患者进行教育:ChatGPT与谷歌Gemini的比较分析
Updates Surg. 2025 Jan 15. doi: 10.1007/s13304-025-02074-8.
10
Analyzing the Effectiveness of AI-Generated Patient Education Materials: A Comparative Study of ChatGPT and Google Gemini.分析人工智能生成的患者教育材料的有效性:ChatGPT与谷歌Gemini的比较研究
Cureus. 2024 Nov 25;16(11):e74398. doi: 10.7759/cureus.74398. eCollection 2024 Nov.