使用检测器和不知情的人类评审员，将ChatGPT生成的科学摘要与真实摘要进行比较。

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.

作者信息

Gao Catherine A, Howard Frederick M, Markov Nikolay S, Dyer Emma C, Ramesh Siddhi, Luo Yuan, Pearson Alexander T

机构信息

Division of Pulmonary and Critical Care, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Section of Hematology/Oncology, Department of Medicine, University of Chicago, Chicago, IL, USA.

出版信息

NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.

DOI:10.1038/s41746-023-00819-6

PMID:37100871

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10133283/

Abstract

Large language models such as ChatGPT can produce increasingly realistic text, with unknown information on the accuracy and integrity of using these models in scientific writing. We gathered fifth research abstracts from five high-impact factor medical journals and asked ChatGPT to generate research abstracts based on their titles and journals. Most generated abstracts were detected using an AI output detector, 'GPT-2 Output Detector', with % 'fake' scores (higher meaning more likely to be generated) of median [interquartile range] of 99.98% 'fake' [12.73%, 99.98%] compared with median 0.02% [IQR 0.02%, 0.09%] for the original abstracts. The AUROC of the AI output detector was 0.94. Generated abstracts scored lower than original abstracts when run through a plagiarism detector website and iThenticate (higher scores meaning more matching text found). When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, though abstracts they suspected were generated were vaguer and more formulaic. ChatGPT writes believable scientific abstracts, though with completely generated data. Depending on publisher-specific guidelines, AI output detectors may serve as an editorial tool to help maintain scientific standards. The boundaries of ethical and acceptable use of large language models to help scientific writing are still being discussed, and different journals and conferences are adopting varying policies.

摘要

诸如ChatGPT这样的大语言模型能够生成越来越逼真的文本，然而在科学写作中使用这些模型时，其准确性和完整性方面的信息却尚不明确。我们从五本高影响因子医学期刊收集了五篇研究摘要，并要求ChatGPT根据这些摘要的标题和期刊生成研究摘要。大多数生成的摘要使用人工智能输出检测器“GPT - 2输出检测器”进行检测，“虚假”分数（分数越高表明越有可能是生成的）中位数[四分位间距]为99.98%“虚假”[12.73%，99.98%]，而原始摘要的中位数为0.02%[四分位间距0.02%，0.09%]。该人工智能输出检测器的曲线下面积（AUROC）为0.94。当通过抄袭检测网站和iThenticate运行时，生成的摘要得分低于原始摘要（分数越高表明发现的匹配文本越多）。当给出原始摘要和通用摘要的混合样本时，不知情的人类评审员正确识别出68%的生成摘要为由ChatGPT生成，但错误地将14%的原始摘要识别为生成的。评审员表示，区分两者出人意料地困难，尽管他们怀疑是生成的摘要更模糊且更公式化。ChatGPT能写出可信的科学摘要，尽管数据完全是生成的。根据特定出版商的指南，人工智能输出检测器可作为一种编辑工具来帮助维持科学标准。大语言模型在帮助科学写作方面的道德和可接受使用界限仍在讨论中，不同的期刊和会议正在采用不同的政策。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/589b/10133283/befb62323f27/41746_2023_819_Fig1_HTML.jpg

相似文献

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.使用检测器和不知情的人类评审员，将ChatGPT生成的科学摘要与真实摘要进行比较。

NPJ Digit Med. 2023 Apr 26;6(1):75. doi: 10.1038/s41746-023-00819-6.

Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.识别肩部和肘部手术领域中由ChatGPT生成的摘要对审稿人来说是一项挑战。

Arthroscopy. 2025 Apr;41(4):916-924.e2. doi: 10.1016/j.arthro.2024.06.045. Epub 2024 Jul 9.

Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.评估 ChatGPT 和 Bard 生成的结构化摘要与脊柱外科领域人类撰写的摘要在可重复性方面的比较：对比分析。

J Med Internet Res. 2024 Jun 26;26:e52001. doi: 10.2196/52001.

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.人机之争：在妇科和泌尿外科学中识别 ChatGPT 生成的摘要。

Am J Obstet Gynecol. 2024 Aug;231(2):276.e1-276.e10. doi: 10.1016/j.ajog.2024.04.045. Epub 2024 May 6.

A Study on Distinguishing ChatGPT-Generated and Human-Written Orthopaedic Abstracts by Reviewers: Decoding the Discrepancies.评审者区分ChatGPT生成和人工撰写的骨科摘要的研究：解读差异

Cureus. 2023 Nov 21;15(11):e49166. doi: 10.7759/cureus.49166. eCollection 2023 Nov.

Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study.ChatGPT 生成的和人工撰写的基础研究摘要在质量、正确性和相似性方面的比较：横断面研究。

J Med Internet Res. 2023 Dec 25;25:e51229. doi: 10.2196/51229.

Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts.ChatGPT 能否协助医学期刊的作者撰写摘要？评估 ChatGPT 生成的科学摘要和原始摘要的质量。

PLoS One. 2024 Feb 14;19(2):e0297701. doi: 10.1371/journal.pone.0297701. eCollection 2024.

What is the rate of text generated by artificial intelligence over a year of publication in Orthopedics & Traumatology: Surgery & Research? Analysis of 425 articles before versus after the launch of ChatGPT in November 2022.在《矫形外科与创伤学：手术与研究》杂志上发表的人工智能文本在一年时间内的生成率是多少？分析 2022 年 11 月 ChatGPT 发布前后的 425 篇文章。

Orthop Traumatol Surg Res. 2023 Dec;109(8):103694. doi: 10.1016/j.otsr.2023.103694. Epub 2023 Sep 29.

The Potential and Concerns of Using AI in Scientific Research: ChatGPT Performance Evaluation.人工智能在科学研究中的潜力与担忧：ChatGPT性能评估

JMIR Med Educ. 2023 Sep 14;9:e47049. doi: 10.2196/47049.

The ChatGPT conundrum: Human-generated scientific manuscripts misidentified as AI creations by AI text detection tool.ChatGPT难题：人工撰写的科学手稿被人工智能文本检测工具误判为人工智能创作。

J Pathol Inform. 2023 Oct 17;14:100342. doi: 10.1016/j.jpi.2023.100342. eCollection 2023.

引用本文的文献

Can Residency Programs Detect Artificial Intelligence Use in Personal Statements?住院医师培训项目能检测出个人陈述中使用了人工智能吗？

Cureus. 2025 Jul 29;17(7):e88969. doi: 10.7759/cureus.88969. eCollection 2025 Jul.

Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis.100篇顶级文章的识别与分类以及大语言模型的未来：基于文献计量分析的主题分析

JMIR AI. 2025 Aug 27;4:e68603. doi: 10.2196/68603.

Can ChatGPT Recognize Its Own Writing in Scientific Abstracts?ChatGPT能在科学摘要中识别出自己的写作内容吗？

Cureus. 2025 Jul 25;17(7):e88774. doi: 10.7759/cureus.88774. eCollection 2025 Jul.

Empowering standardized residency training in China through large language models: problem analysis and solutions.通过大语言模型推动中国住院医师规范化培训：问题分析与解决方案

Ann Med. 2025 Dec;57(1):2516695. doi: 10.1080/07853890.2025.2516695. Epub 2025 Jul 15.

Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.评估ChatGPT模型在回答口腔颌面病理学和口腔放射学多项选择题方面的准确性。

Digit Health. 2025 Jul 8;11:20552076251355847. doi: 10.1177/20552076251355847. eCollection 2025 Jan-Dec.

Will ChatGPT-4 improve the quality of medical abstracts?ChatGPT-4会提高医学摘要的质量吗？

Paediatr Child Health. 2024 Sep 12;30(3):116-121. doi: 10.1093/pch/pxae062. eCollection 2025 Jun.

LLM-generated messages can persuade humans on policy issues.大语言模型生成的信息能够在政策问题上说服人类。

Nat Commun. 2025 Jul 1;16(1):6037. doi: 10.1038/s41467-025-61345-5.

Countering AI-generated misinformation with pre-emptive source discreditation and debunking.通过先发制人的来源抹黑和辟谣来对抗人工智能生成的错误信息。

R Soc Open Sci. 2025 Jun 25;12(6):242148. doi: 10.1098/rsos.242148. eCollection 2025 Jun.

Large Language Models in Medicine: Applications, Challenges, and Future Directions.医学领域的大语言模型：应用、挑战与未来方向。

Int J Med Sci. 2025 May 31;22(11):2792-2801. doi: 10.7150/ijms.111780. eCollection 2025.

Artificial Intelligence and Publishing Ethics: A Narrative Review and SWOT Analysis.人工智能与出版伦理：叙事性综述及SWOT分析

Cureus. 2025 May 14;17(5):e84098. doi: 10.7759/cureus.84098. eCollection 2025 May.

本文引用的文献

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Nonhuman "Authors" and Implications for the Integrity of Scientific Publication and Medical Knowledge.非人类“作者”以及对科学出版物和医学知识完整性的影响。

JAMA. 2023 Feb 28;329(8):637-639. doi: 10.1001/jama.2023.1344.

ChatGPT is fun, but not an author.ChatGPT 很有趣，但不是作者。

Science. 2023 Jan 27;379(6630):313. doi: 10.1126/science.adg7879. Epub 2023 Jan 26.

Using AI to write scholarly publications.使用人工智能撰写学术出版物。

Account Res. 2024 Oct;31(7):715-723. doi: 10.1080/08989621.2023.2168535. Epub 2023 Jan 25.

Tools such as ChatGPT threaten transparent science; here are our ground rules for their use.ChatGPT 之类的工具对科学的透明度构成威胁；以下是我们使用这些工具的基本规则。

Nature. 2023 Jan;613(7945):612. doi: 10.1038/d41586-023-00191-1.

AI bot ChatGPT writes smart essays - should professors worry?人工智能聊天机器人ChatGPT能写出很巧妙的文章——教授们应该担心吗？

Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.

The impact of site-specific digital histology signatures on deep learning model accuracy and bias.基于组织特异性数字组织学特征对深度学习模型准确性和偏差的影响。

Nat Commun. 2021 Jul 20;12(1):4423. doi: 10.1038/s41467-021-24698-1.

Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery.考虑生成式预训练变换器3（GPT-3）在医疗服务中的可能性和潜在问题。

NPJ Digit Med. 2021 Jun 3;4(1):93. doi: 10.1038/s41746-021-00464-x.

Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It.人工智能很愚蠢，因果推理也无法解决这一问题。

Front Psychol. 2021 Jan 5;11:513474. doi: 10.3389/fpsyg.2020.513474. eCollection 2020.

Artificial Intelligence in Medicine: Today and Tomorrow.医学中的人工智能：现状与未来。

Front Med (Lausanne). 2020 Feb 5;7:27. doi: 10.3389/fmed.2020.00027. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用检测器和不知情的人类评审员，将ChatGPT生成的科学摘要与真实摘要进行比较。

Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献