Walters William H, Wilder Esther Isabelle
Mary Alice & Tom O'Malley Library, Manhattan College, Riverdale, NY, USA.
Department of Sociology, Lehman College, The City University of New York, Bronx, NY, USA.
Sci Rep. 2023 Sep 7;13(1):14045. doi: 10.1038/s41598-023-41032-5.
Although chatbots such as ChatGPT can facilitate cost-effective text generation and editing, factually incorrect responses (hallucinations) limit their utility. This study evaluates one particular type of hallucination: fabricated bibliographic citations that do not represent actual scholarly works. We used ChatGPT-3.5 and ChatGPT-4 to produce short literature reviews on 42 multidisciplinary topics, compiling data on the 636 bibliographic citations (references) found in the 84 papers. We then searched multiple databases and websites to determine the prevalence of fabricated citations, to identify errors in the citations to non-fabricated papers, and to evaluate adherence to APA citation format. Within this set of documents, 55% of the GPT-3.5 citations but just 18% of the GPT-4 citations are fabricated. Likewise, 43% of the real (non-fabricated) GPT-3.5 citations but just 24% of the real GPT-4 citations include substantive citation errors. Although GPT-4 is a major improvement over GPT-3.5, problems remain.
尽管ChatGPT等聊天机器人可以促进具有成本效益的文本生成和编辑,但事实性错误的回答(幻觉)限制了它们的实用性。本研究评估了一种特定类型的幻觉:不代表实际学术作品的虚假参考文献。我们使用ChatGPT-3.5和ChatGPT-4针对42个多学科主题生成简短的文献综述,收集了84篇论文中发现的636条参考文献的数据。然后,我们在多个数据库和网站中进行搜索,以确定虚假引用的发生率,识别对非虚假论文引用中的错误,并评估对美国心理学会(APA)引用格式的遵循情况。在这组文档中,GPT-3.5生成的引用中有55%是虚假的,而GPT-4生成的引用中只有18%是虚假的。同样,真实(非虚假)的GPT-3.5引用中有43%存在实质性引用错误,而真实的GPT-4引用中只有24%存在此类错误。尽管GPT-4相比GPT-3.5有了重大改进,但问题仍然存在。