• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索ChatGPT的人工智能幻觉:ChatGPT模型的参考文献准确性和引用相关性以及训练条件

Exploring AI Hallucinations of ChatGPT: Reference Accuracy and Citation Relevance of ChatGPT Models and Training Conditions.

作者信息

Cheng Adam, Nagesh Vikhashni, Eller Susan, Grant Vincent, Lin Yiqun

机构信息

From the KidSIM Simulation Program (A.C.), Alberta Children's Hospital, Departments of Pediatrics and Emergency Medicine, Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Department of Pediatrics (V.N.), Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; Center for Immersive and Simulation-Based Learning (S.E.), Stanford School of Medicine, Stanford, CA; Departments of Pediatrics and Emergency Medicine (V.G.), Cumming School of Medicine, University of Calgary, Calgary, Alberta, Canada; and KidSIM Simulation Program (Y.L.), Alberta Children's Hospital, Calgary, Alberta, Canada.

出版信息

Simul Healthc. 2025 Aug 7. doi: 10.1097/SIH.0000000000000877.

DOI:10.1097/SIH.0000000000000877
PMID:40772773
Abstract

INTRODUCTION

Large language model-based generative AI tools, such as the Chat Generative Pre-trained Transformer (ChatGPT) platform, have been used to assist with writing academic manuscripts. Little is known about ChatGPT's ability to accurately cite relevant references in health care simulation-related scholarly manuscripts. In this study, we sought to: (1) determine the reference accuracy and citation relevance among health care simulation debriefing articles generated by 2 different models of ChatGPT and (2) determine if ChatGPT models can be trained with specific prompts to improve reference accuracy and citation relevance.

METHODS

The ChatGPT-4 and ChatGPT o1 models were asked to generate scholarly articles with appropriate references based upon three different article titles about health care simulation debriefing. Five articles with references were generated for each article title-3 ChatGPT-4 training conditions and 2 ChatGPT o1 training conditions. Each article was assessed independently by 2 blinded reviewers for reference accuracy and citation relevance.

RESULTS

Fifteen articles were generated in total: 9 articles by ChatGPT-4 and 6 articles by ChatGPT o1. A total of 60.4% of the 303 references generated across 5 training conditions were classified as accurate, with no significant difference in reference accuracy between the 5 conditions. A total of 22.2% of the 451 citations were classified as highly relevant, with no significant difference in citation relevance across the 5 conditions.

CONCLUSIONS

Among debriefing articles generated by ChatGPT-4 and ChatGPT o1, both ChatGPT models are unreliable with respect to reference accuracy and citation relevance. Reference accuracy and citation relevance for debriefing articles do not improve even with some degree of training built into ChatGPT prompts.

摘要

引言

基于大语言模型的生成式人工智能工具,如聊天生成预训练变换器(ChatGPT)平台,已被用于协助撰写学术手稿。关于ChatGPT在医疗模拟相关学术手稿中准确引用相关参考文献的能力,我们知之甚少。在本研究中,我们旨在:(1)确定由两种不同型号的ChatGPT生成的医疗模拟汇报文章中的参考文献准确性和引用相关性;(2)确定ChatGPT模型是否可以通过特定提示进行训练,以提高参考文献准确性和引用相关性。

方法

要求ChatGPT-4和ChatGPT o1模型根据关于医疗模拟汇报的三个不同文章标题生成带有适当参考文献的学术文章。每个文章标题生成五篇带有参考文献的文章——3种ChatGPT-4训练条件和2种ChatGPT o1训练条件。由2名盲法评审员独立评估每篇文章的参考文献准确性和引用相关性。

结果

共生成15篇文章:ChatGPT-4生成9篇,ChatGPT o1生成6篇。在5种训练条件下生成的303条参考文献中,共有60.4%被分类为准确,5种条件下的参考文献准确性无显著差异。在451条引用中,共有22.2%被分类为高度相关,5种条件下的引用相关性无显著差异。

结论

在由ChatGPT-4和ChatGPT o1生成的汇报文章中,两种ChatGPT模型在参考文献准确性和引用相关性方面都不可靠。即使在ChatGPT提示中加入一定程度的训练,汇报文章的参考文献准确性和引用相关性也不会提高。

相似文献

1
Exploring AI Hallucinations of ChatGPT: Reference Accuracy and Citation Relevance of ChatGPT Models and Training Conditions.探索ChatGPT的人工智能幻觉:ChatGPT模型的参考文献准确性和引用相关性以及训练条件
Simul Healthc. 2025 Aug 7. doi: 10.1097/SIH.0000000000000877.
2
Comparing AI-generated and human peer reviews: A study on 11 articles.比较人工智能生成的同行评审和人工同行评审:对11篇文章的研究
Hand Surg Rehabil. 2025 Jul 19:102225. doi: 10.1016/j.hansur.2025.102225.
3
Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses?ChatGPT能否被视为关于种植体支持修复体治疗计划的学术文章的可靠资源?
J Prosthet Dent. 2025 Apr 9. doi: 10.1016/j.prosdent.2025.03.025.
4
Sexual Harassment and Prevention Training性骚扰与预防培训
5
Navigating the future of pediatric cardiovascular surgery: Insights and innovation powered by Chat Generative Pre-Trained Transformer (ChatGPT).探索小儿心血管外科的未来:由聊天生成预训练变换器(ChatGPT)推动的见解与创新。
J Thorac Cardiovasc Surg. 2025 Feb 1. doi: 10.1016/j.jtcvs.2025.01.022.
6
Debriefing interventions for the prevention of psychological trauma in women following childbirth.产后女性心理创伤预防的汇报干预措施。
Cochrane Database Syst Rev. 2015 Apr 10;2015(4):CD007194. doi: 10.1002/14651858.CD007194.pub2.
7
Clinical symptoms, signs and tests for identification of impending and current water-loss dehydration in older people.老年人即将发生和当前失水脱水的识别的临床症状、体征及检查
Cochrane Database Syst Rev. 2015 Apr 30;2015(4):CD009647. doi: 10.1002/14651858.CD009647.pub2.
8
"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”:ChatGPT-4 的治疗建议与骨科临床实践指南如何契合?
Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.
9
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
10
Electric fans for reducing adverse health impacts in heatwaves.用于减少热浪期间不良健康影响的电风扇。
Cochrane Database Syst Rev. 2012 Jul 11;2012(7):CD009888. doi: 10.1002/14651858.CD009888.pub2.