Suppr超能文献

放射学中人工智能生成的社论:专家编辑能检测出来吗?

Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?

作者信息

Ozkara Burak Berksu, Boutet Alexandre, Comstock Bryan A, Van Goethem Johan, Huisman Thierry A G M, Ross Jeffrey S, Saba Luca, Shah Lubdha M, Wintermark Max, Castillo Mauricio

机构信息

From the Department of Neuroradiology (B.B.O., M.W.), The University of Texas MD Anderson Center, Houston, Texas.

Joint Department of Medical Imaging (A.B.), University of Toronto, Toronto, Ontario, Canada.

出版信息

AJNR Am J Neuroradiol. 2025 Mar 4;46(3):559-566. doi: 10.3174/ajnr.A8505.

Abstract

BACKGROUND AND PURPOSE

Artificial intelligence is capable of generating complex texts that may be indistinguishable from those written by humans. We aimed to evaluate the ability of GPT-4 to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing.

MATERIALS AND METHODS

Sixteen editorials from 8 journals were included. To generate the artificial intelligence (AI)-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article by using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism.

RESULTS

The human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. Fifty-eight percent of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%).

CONCLUSIONS

GPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles.

摘要

背景与目的

人工智能能够生成与人类撰写的文本难以区分的复杂文本。我们旨在评估GPT-4撰写放射学社论的能力,并将其与人类撰写的社论进行比较,从而确定其在科学写作中的实际适用性。

材料与方法

纳入了来自8种期刊的16篇社论。为了生成人工智能撰写的社论,将16篇人类撰写社论的摘要输入GPT-4。6位经验丰富的编辑对文章进行评审。首先,采用非配对方法。要求评分者通过在指定指标上使用1-5李克特量表来评估每篇文章的内容。然后,他们确定社论是由人类还是人工智能撰写的。接着将文章进行配对评估,以确定哪篇文章是由人工智能生成的,哪篇应该发表。最后,使用人工智能检测器对文章进行分析并检测是否存在抄袭。

结果

人类撰写的文章的人工智能概率得分中位数为2.0%,而人工智能撰写的文章为58%。人工智能撰写的文章之间的相似度得分中位数为3%。在未配对的文章中,58%的文章在作者身份认定上被正确分类。在配对设置中,评分准确率提高到了70%。在大多数指标上,人工智能撰写的文章得分略高。按感知进行分层时,人类撰写的被感知文章在大多数类别中的评分更高。在配对设置中,评分者强烈倾向于发表他们认为是人类撰写的文章(82%)。

结论

GPT-4可以写出iThenticate未标记为抄袭的高质量文章,编辑可能无法察觉,且检测工具只能在有限程度上进行检测。编辑对人类撰写的文章存在积极偏见。

相似文献

7
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

本文引用的文献

6
Beyond the Keyboard: Academic Writing in the Era of ChatGPT.超越键盘:ChatGPT时代的学术写作
J Korean Med Sci. 2023 Jul 3;38(26):e207. doi: 10.3346/jkms.2023.38.e207.
7
Large language models and the emergence phenomena.大语言模型与涌现现象。
Eur J Radiol Open. 2023 Jun 6;10:100494. doi: 10.1016/j.ejro.2023.100494. eCollection 2023.
10
ChatGPT and Other Large Language Models Are Double-edged Swords.ChatGPT和其他大型语言模型是双刃剑。
Radiology. 2023 Apr;307(2):e230163. doi: 10.1148/radiol.230163. Epub 2023 Jan 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验