Suppr超能文献

评审员在检测和判断人类与人工智能内容方面的体验:期刊征文比赛。

Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Journal Essay Contest.

机构信息

Hospital Israelita Albert Einstein and Departamento de Neurologia e Neurocirurgia, Universidade Federal de São Paulo, Brazil (G.S.S.).

Section of Cardiovascular Medicine (R.K.), Yale School of Medicine, New Haven, CT.

出版信息

Stroke. 2024 Oct;55(10):2573-2578. doi: 10.1161/STROKEAHA.124.045012. Epub 2024 Sep 3.

Abstract

Artificial intelligence (AI) large language models (LLMs) now produce human-like general text and images. LLMs' ability to generate persuasive scientific essays that undergo evaluation under traditional peer review has not been systematically studied. To measure perceptions of quality and the nature of authorship, we conducted a competitive essay contest in 2024 with both human and AI participants. Human authors and 4 distinct LLMs generated essays on controversial topics in stroke care and outcomes research. A panel of Editorial Board members (mostly vascular neurologists), blinded to author identity and with varying levels of AI expertise, rated the essays for quality, persuasiveness, best in topic, and author type. Among 34 submissions (22 human and 12 LLM) scored by 38 reviewers, human and AI essays received mostly similar ratings, though AI essays were rated higher for composition quality. Author type was accurately identified only 50% of the time, with prior LLM experience associated with improved accuracy. In multivariable analyses adjusted for author attributes and essay quality, only persuasiveness was independently associated with odds of a reviewer assigning AI as author type (adjusted odds ratio, 1.53 [95% CI, 1.09-2.16]; =0.01). In conclusion, a group of experienced editorial board members struggled to distinguish human versus AI authorship, with a bias against best in topic for essays judged to be AI generated. Scientific journals may benefit from educating reviewers on the types and uses of AI in scientific writing and developing thoughtful policies on the appropriate use of AI in authoring manuscripts.

摘要

人工智能(AI)大型语言模型(LLM)现在可以生成类似人类的通用文本和图像。尚未系统地研究 LLM 生成具有说服力的科学论文的能力,这些论文在传统同行评审下进行评估。为了衡量质量感知和作者身份的性质,我们在 2024 年举办了一场具有人类和 AI 参与者的竞争性征文比赛。人类作者和 4 种不同的 LLM 生成了中风护理和结果研究中具有争议性主题的论文。由编辑委员会成员(主要是血管神经病学家)组成的小组对论文进行了质量、说服力、最佳主题和作者类型的评估,他们对作者身份和 AI 专业知识的了解程度不一。在 38 名评审员对 34 份(22 份人类和 12 份 LLM)的评分中,人类和 AI 论文的评分大多相似,尽管 AI 论文的作文质量评分更高。只有 50%的时间准确识别了作者类型,而具有 LLM 经验与提高准确性相关。在调整了作者属性和论文质量的多变量分析中,只有说服力与 reviewer 将 AI 分配为作者类型的可能性独立相关(调整后的优势比,1.53 [95%CI,1.09-2.16];=0.01)。总之,一组经验丰富的编辑委员会成员难以区分人类与 AI 作者身份,对于被认为是 AI 生成的论文,他们对最佳主题存在偏见。科学期刊可能受益于教育评审员关于 AI 在科学写作中的类型和用途,并制定关于在撰写手稿中适当使用 AI 的深思熟虑的政策。

相似文献

7
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

本文引用的文献

6
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
7
ChatGPT and the Future of Journal Reviews: A Feasibility Study.ChatGPT 与期刊评审的未来:一项可行性研究。
Yale J Biol Med. 2023 Sep 29;96(3):415-420. doi: 10.59249/SKDH9286. eCollection 2023 Sep.
9
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验