放射学中人工智能生成的社论：专家编辑能检测出来吗？

Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?

作者信息

Ozkara Burak Berksu, Boutet Alexandre, Comstock Bryan A, Van Goethem Johan, Huisman Thierry A G M, Ross Jeffrey S, Saba Luca, Shah Lubdha M, Wintermark Max, Castillo Mauricio

机构信息

From the Department of Neuroradiology (B.B.O., M.W.), The University of Texas MD Anderson Center, Houston, Texas.

Joint Department of Medical Imaging (A.B.), University of Toronto, Toronto, Ontario, Canada.

出版信息

AJNR Am J Neuroradiol. 2025 Mar 4;46(3):559-566. doi: 10.3174/ajnr.A8505.

DOI:10.3174/ajnr.A8505

PMID:39288967

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11979811/

Abstract

BACKGROUND AND PURPOSE

Artificial intelligence is capable of generating complex texts that may be indistinguishable from those written by humans. We aimed to evaluate the ability of GPT-4 to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing.

MATERIALS AND METHODS

Sixteen editorials from 8 journals were included. To generate the artificial intelligence (AI)-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article by using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism.

RESULTS

The human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. Fifty-eight percent of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%).

CONCLUSIONS

GPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles.

摘要

背景与目的

人工智能能够生成与人类撰写的文本难以区分的复杂文本。我们旨在评估GPT-4撰写放射学社论的能力，并将其与人类撰写的社论进行比较，从而确定其在科学写作中的实际适用性。

材料与方法

纳入了来自8种期刊的16篇社论。为了生成人工智能撰写的社论，将16篇人类撰写社论的摘要输入GPT-4。6位经验丰富的编辑对文章进行评审。首先，采用非配对方法。要求评分者通过在指定指标上使用1-5李克特量表来评估每篇文章的内容。然后，他们确定社论是由人类还是人工智能撰写的。接着将文章进行配对评估，以确定哪篇文章是由人工智能生成的，哪篇应该发表。最后，使用人工智能检测器对文章进行分析并检测是否存在抄袭。

结果

人类撰写的文章的人工智能概率得分中位数为2.0%，而人工智能撰写的文章为58%。人工智能撰写的文章之间的相似度得分中位数为3%。在未配对的文章中，58%的文章在作者身份认定上被正确分类。在配对设置中，评分准确率提高到了70%。在大多数指标上，人工智能撰写的文章得分略高。按感知进行分层时，人类撰写的被感知文章在大多数类别中的评分更高。在配对设置中，评分者强烈倾向于发表他们认为是人类撰写的文章（82%）。

结论

GPT-4可以写出iThenticate未标记为抄袭的高质量文章，编辑可能无法察觉，且检测工具只能在有限程度上进行检测。编辑对人类撰写的文章存在积极偏见。

相似文献

Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?放射学中人工智能生成的社论：专家编辑能检测出来吗？

AJNR Am J Neuroradiol. 2025 Mar 4;46(3):559-566. doi: 10.3174/ajnr.A8505.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。

Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。

Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals.试验报告的统一标准（CONSORT）以及医学期刊上发表的随机对照试验（RCT）的报告完整性。

Cochrane Database Syst Rev. 2012 Nov 14;11(11):MR000030. doi: 10.1002/14651858.MR000030.pub2.

Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA？一项初步评估。

Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.

Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤

Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.

引用本文的文献

The Surge of Artificial Intelligence (AI) in Scientific Writing: Who Will Hold the Rudder, You or AI?人工智能在科学写作中的崛起：掌舵者将是你还是人工智能？

Hip Pelvis. 2024 Dec 1;36(4):231-233. doi: 10.5371/hp.2024.36.4.231.

本文引用的文献

A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。

NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.

A large-scale comparison of human-written versus ChatGPT-generated essays.人工撰写与ChatGPT生成的文章的大规模比较。

Sci Rep. 2023 Oct 30;13(1):18617. doi: 10.1038/s41598-023-45644-9.

Is ChatGPT a "Fire of Prometheus" for Non-Native English-Speaking Researchers in Academic Writing?ChatGPT 是否为非英语母语的学术写作者带来了“普罗米修斯之火”？

Korean J Radiol. 2023 Oct;24(10):952-959. doi: 10.3348/kjr.2023.0773.

The Integration of Large Language Models Such as ChatGPT in Scientific Writing: Harnessing Potential and Addressing Pitfalls.ChatGPT等大语言模型在科学写作中的整合：发挥潜力与应对陷阱

Korean J Radiol. 2023 Sep;24(9):924-925. doi: 10.3348/kjr.2023.0738.

Understanding Radiological Journal Views and Policies on Large Language Models in Academic Writing.了解放射学杂志对学术写作中大型语言模型的看法和政策。

J Am Coll Radiol. 2024 Apr;21(4):678-682. doi: 10.1016/j.jacr.2023.08.001. Epub 2023 Aug 7.

Beyond the Keyboard: Academic Writing in the Era of ChatGPT.超越键盘：ChatGPT时代的学术写作

J Korean Med Sci. 2023 Jul 3;38(26):e207. doi: 10.3346/jkms.2023.38.e207.

Large language models and the emergence phenomena.大语言模型与涌现现象。

Eur J Radiol Open. 2023 Jun 6;10:100494. doi: 10.1016/j.ejro.2023.100494. eCollection 2023.

Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.人工智能可以生成虚假但看起来真实的科学医学文章：潘多拉的盒子已经被打开。

J Med Internet Res. 2023 May 31;25:e46924. doi: 10.2196/46924.

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.使用检测器和不知情的人类评审员，将ChatGPT生成的科学摘要与真实摘要进行比较。

NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.

ChatGPT and Other Large Language Models Are Double-edged Swords.ChatGPT和其他大型语言模型是双刃剑。

Radiology. 2023 Apr;307(2):e230163. doi: 10.1148/radiol.230163. Epub 2023 Jan 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验