文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

我们能信任学术人工智能侦探吗?人工智能输出检测器的准确性和局限性。

Can we trust academic AI detective? Accuracy and limitations of AI-output detectors.

作者信息

Erol Gökberk, Ergen Anıl, Gülşen Erol Büşra, Kaya Ergen Şebnem, Bora Tevfik Serhan, Çölgeçen Ali Deniz, Araz Büşra, Şahin Cansel, Bostancı Günsu, Kılıç İlayda, Macit Zeynep Birce, Sevgi Umut Tan, Güngör Abuzer

机构信息

Department of Neurosurgery, Adiyaman Training and Research Hospital, Adiyaman, Türkiye.

Department of Neurosurgery, Derince Training and Research Hospital, Kocaeli, Türkiye.

出版信息

Acta Neurochir (Wien). 2025 Aug 7;167(1):214. doi: 10.1007/s00701-025-06622-4.


DOI:10.1007/s00701-025-06622-4
PMID:40773066
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12331776/
Abstract

OBJECTIVE: This study evaluates the reliability and accuracy of AI-generated text detection tools in distinguishing human-authored academic content from AI-generated texts, highlighting potential challenges and ethical considerations in their application within the scientific community. METHODS: This study analyzed the detectability of AI-generated academic content using abstracts and introductions created by ChatGPT versions 3.5, 4, and 4o, alongside human-written originals from the pre-ChatGPT era. Articles were sourced from four high impact neurosurgery journals and categorized into four categories: originals and generated by ChatGPT 3.5, ChatGPT 4, and ChatGPT 4o. AI-output detectors (GPTZero, ZeroGPT, Corrector App) were employed to classify 1,000 texts as human- or AI-generated. Additionally, plagiarism checks were performed on AI-generated content to evaluate uniqueness. RESULTS: A total of 250 human-authored articles and 750 ChatGPT-generated texts were analyzed using three AI-output detectors (Corrector, ZeroGPT, GPTZero). Human-authored texts consistently had the lowest AI likelihood scores, while AI-generated texts exhibited significantly higher scores across all versions of ChatGPT (p < 0.01). Plagiarism detection revealed high originality for ChatGPT-generated content, with no significant differences among versions (p > 0.05). ROC analysis demonstrated that AI-output detectors effectively distinguished AI-generated content from human-written texts, with areas under the curve (AUC) ranging from 0.75 to 1.00 for all models. However, none of the detectors achieved 100% reliability in distinguishing AI-generated content. CONCLUSIONS: While models like ChatGPT enhance content creation and efficiency, they raise ethical concerns, particularly in fields demanding trust and precision. AI-output detectors exhibit moderate to high success in distinguishing AI-generated texts, but false positives pose risks to researchers. Improving detector reliability and establishing clear policies on AI usage are critical to mitigate misuse while fully leveraging AI's benefits.

摘要

目的:本研究评估人工智能生成文本检测工具在区分人类撰写的学术内容与人工智能生成的文本方面的可靠性和准确性,突出其在科学界应用中的潜在挑战和伦理考量。 方法:本研究使用ChatGPT 3.5、4和4o版本生成的摘要和引言,以及ChatGPT时代之前人类撰写的原文,分析人工智能生成的学术内容的可检测性。文章来源于四种高影响力的神经外科期刊,并分为四类:原文以及由ChatGPT 3.5、ChatGPT 4和ChatGPT 4o生成的文章。使用人工智能输出检测器(GPTZero、ZeroGPT、校正器应用程序)将1000篇文本分类为人类生成或人工智能生成。此外,对人工智能生成的内容进行剽窃检查以评估其独特性。 结果:使用三种人工智能输出检测器(校正器、ZeroGPT、GPTZero)对总共250篇人类撰写的文章和750篇ChatGPT生成的文本进行了分析。人类撰写的文本始终具有最低的人工智能可能性得分,而在ChatGPT的所有版本中,人工智能生成的文本得分显著更高(p < 0.01)。剽窃检测显示ChatGPT生成的内容具有很高的原创性,各版本之间无显著差异(p > 0.05)。ROC分析表明,人工智能输出检测器能够有效地区分人工智能生成的内容与人类撰写的文本,所有模型的曲线下面积(AUC)范围为0.75至1.00。然而,没有一个检测器在区分人工智能生成的内容方面达到100%的可靠性。 结论:虽然像ChatGPT这样的模型提高了内容创作和效率,但它们引发了伦理问题,尤其是在需要信任和精确性的领域。人工智能输出检测器在区分人工智能生成的文本方面表现出中等至高的成功率,但误报对研究人员构成风险。提高检测器的可靠性并制定明确的人工智能使用政策对于减轻滥用风险同时充分利用人工智能的益处至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/a5af382b9fe0/701_2025_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/73269d14a191/701_2025_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/bb3947534962/701_2025_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/a5af382b9fe0/701_2025_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/73269d14a191/701_2025_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/bb3947534962/701_2025_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg

相似文献

[1]
Can we trust academic AI detective? Accuracy and limitations of AI-output detectors.

Acta Neurochir (Wien). 2025-8-7

[2]
Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?

AJNR Am J Neuroradiol. 2025-3-4

[3]
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.

Cochrane Evid Synth Methods. 2025-7-28

[4]
Defining the Boundaries of AI Use in Scientific Writing: A Comparative Review of Editorial Policies.

J Korean Med Sci. 2025-6-16

[5]
Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.

Int J Emerg Med. 2025-8-7

[6]
AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.

J Med Imaging Radiat Sci. 2025-3-28

[7]
Using AI to Write a Review Article Examining the Role of the Nervous System on Skeletal Homeostasis and Fracture Healing.

Curr Osteoporos Rep. 2024-2

[8]
Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.

Curr Pharm Teach Learn. 2025-7-7

[9]
Figure plagiarism and manipulation, an under-recognised problem in academia.

Eur Radiol. 2025-8

[10]
The Ability of ChatGPT in Paraphrasing Texts and Reducing Plagiarism: A Descriptive Analysis.

JMIR Med Educ. 2024-7-8

本文引用的文献

[1]
Knowledge, interest and perspectives on Artificial Intelligence in Neurosurgery. A global survey.

Brain Spine. 2024-12-9

[2]
Automatic Segmentation of Vestibular Schwannomas: A Systematic Review.

World Neurosurg. 2024-8

[3]
AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research.

Healthcare (Basel). 2024-4-13

[4]
Evaluation of the safety, accuracy, and helpfulness of the GPT-4.0 Large Language Model in neurosurgery.

J Clin Neurosci. 2024-5

[5]
Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.

Brain Spine. 2024-2-13

[6]
Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases.

Clin Neurol Neurosurg. 2024-4

[7]
Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study.

JMIR Med Educ. 2023-12-28

[8]
A Study on Distinguishing ChatGPT-Generated and Human-Written Orthopaedic Abstracts by Reviewers: Decoding the Discrepancies.

Cureus. 2023-11-21

[9]
The ChatGPT conundrum: Human-generated scientific manuscripts misidentified as AI creations by AI text detection tool.

J Pathol Inform. 2023-10-17

[10]
Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams.

Comput Biol Med. 2024-2

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索