Department of Maxillo-Facial Surgery, Policlinico Le Scotte, University of Siena, Siena, Italy.
Phoniatris and Audiology Unit, Department of Neuroscience DNS, University of Padova, Treviso, Italy.
Eur Arch Otorhinolaryngol. 2023 Nov;280(11):5129-5133. doi: 10.1007/s00405-023-08205-4. Epub 2023 Sep 8.
ChatGPT has gained popularity as a web application since its release in 2022. While artificial intelligence (AI) systems' potential in scientific writing is widely discussed, their reliability in reviewing literature and providing accurate references remains unexplored. This study examines the reliability of references generated by ChatGPT language models in the Head and Neck field.
Twenty clinical questions were generated across different Head and Neck disciplines, to prompt ChatGPT versions 3.5 and 4.0 to produce texts on the assigned topics. The generated references were categorized as "true," "erroneous," or "inexistent" based on congruence with existing records in scientific databases.
ChatGPT 4.0 outperformed version 3.5 in terms of reference reliability. However, both versions displayed a tendency to provide erroneous/non-existent references.
It is crucial to address this challenge to maintain the reliability of scientific literature. Journals and institutions should establish strategies and good-practice principles in the evolving landscape of AI-assisted scientific writing.
自 2022 年发布以来,ChatGPT 作为一种网络应用程序广受欢迎。虽然人工智能(AI)系统在科学写作中的潜力广受讨论,但它们在文献综述和提供准确参考文献方面的可靠性仍有待探索。本研究检验了 ChatGPT 语言模型在头颈部领域生成参考文献的可靠性。
生成了 20 个涉及不同头颈部学科的临床问题,以提示 ChatGPT 版本 3.5 和 4.0 针对指定主题生成文本。根据与科学数据库中现有记录的一致性,将生成的参考文献分为“正确”、“错误”或“不存在”。
ChatGPT 4.0 在参考文献可靠性方面优于版本 3.5。然而,两个版本都倾向于提供错误/不存在的参考文献。
必须解决这一挑战,以保持科学文献的可靠性。期刊和机构应在人工智能辅助科学写作的不断发展的环境中制定策略和良好实践原则。