• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

我们能信任学术人工智能侦探吗?人工智能输出检测器的准确性和局限性。

Can we trust academic AI detective? Accuracy and limitations of AI-output detectors.

作者信息

Erol Gökberk, Ergen Anıl, Gülşen Erol Büşra, Kaya Ergen Şebnem, Bora Tevfik Serhan, Çölgeçen Ali Deniz, Araz Büşra, Şahin Cansel, Bostancı Günsu, Kılıç İlayda, Macit Zeynep Birce, Sevgi Umut Tan, Güngör Abuzer

机构信息

Department of Neurosurgery, Adiyaman Training and Research Hospital, Adiyaman, Türkiye.

Department of Neurosurgery, Derince Training and Research Hospital, Kocaeli, Türkiye.

出版信息

Acta Neurochir (Wien). 2025 Aug 7;167(1):214. doi: 10.1007/s00701-025-06622-4.

DOI:10.1007/s00701-025-06622-4
PMID:40773066
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12331776/
Abstract

OBJECTIVE

This study evaluates the reliability and accuracy of AI-generated text detection tools in distinguishing human-authored academic content from AI-generated texts, highlighting potential challenges and ethical considerations in their application within the scientific community.

METHODS

This study analyzed the detectability of AI-generated academic content using abstracts and introductions created by ChatGPT versions 3.5, 4, and 4o, alongside human-written originals from the pre-ChatGPT era. Articles were sourced from four high impact neurosurgery journals and categorized into four categories: originals and generated by ChatGPT 3.5, ChatGPT 4, and ChatGPT 4o. AI-output detectors (GPTZero, ZeroGPT, Corrector App) were employed to classify 1,000 texts as human- or AI-generated. Additionally, plagiarism checks were performed on AI-generated content to evaluate uniqueness.

RESULTS

A total of 250 human-authored articles and 750 ChatGPT-generated texts were analyzed using three AI-output detectors (Corrector, ZeroGPT, GPTZero). Human-authored texts consistently had the lowest AI likelihood scores, while AI-generated texts exhibited significantly higher scores across all versions of ChatGPT (p < 0.01). Plagiarism detection revealed high originality for ChatGPT-generated content, with no significant differences among versions (p > 0.05). ROC analysis demonstrated that AI-output detectors effectively distinguished AI-generated content from human-written texts, with areas under the curve (AUC) ranging from 0.75 to 1.00 for all models. However, none of the detectors achieved 100% reliability in distinguishing AI-generated content.

CONCLUSIONS

While models like ChatGPT enhance content creation and efficiency, they raise ethical concerns, particularly in fields demanding trust and precision. AI-output detectors exhibit moderate to high success in distinguishing AI-generated texts, but false positives pose risks to researchers. Improving detector reliability and establishing clear policies on AI usage are critical to mitigate misuse while fully leveraging AI's benefits.

摘要

目的

本研究评估人工智能生成文本检测工具在区分人类撰写的学术内容与人工智能生成的文本方面的可靠性和准确性,突出其在科学界应用中的潜在挑战和伦理考量。

方法

本研究使用ChatGPT 3.5、4和4o版本生成的摘要和引言,以及ChatGPT时代之前人类撰写的原文,分析人工智能生成的学术内容的可检测性。文章来源于四种高影响力的神经外科期刊,并分为四类:原文以及由ChatGPT 3.5、ChatGPT 4和ChatGPT 4o生成的文章。使用人工智能输出检测器(GPTZero、ZeroGPT、校正器应用程序)将1000篇文本分类为人类生成或人工智能生成。此外,对人工智能生成的内容进行剽窃检查以评估其独特性。

结果

使用三种人工智能输出检测器(校正器、ZeroGPT、GPTZero)对总共250篇人类撰写的文章和750篇ChatGPT生成的文本进行了分析。人类撰写的文本始终具有最低的人工智能可能性得分,而在ChatGPT的所有版本中,人工智能生成的文本得分显著更高(p < 0.01)。剽窃检测显示ChatGPT生成的内容具有很高的原创性,各版本之间无显著差异(p > 0.05)。ROC分析表明,人工智能输出检测器能够有效地区分人工智能生成的内容与人类撰写的文本,所有模型的曲线下面积(AUC)范围为0.75至1.00。然而,没有一个检测器在区分人工智能生成的内容方面达到100%的可靠性。

结论

虽然像ChatGPT这样的模型提高了内容创作和效率,但它们引发了伦理问题,尤其是在需要信任和精确性的领域。人工智能输出检测器在区分人工智能生成的文本方面表现出中等至高的成功率,但误报对研究人员构成风险。提高检测器的可靠性并制定明确的人工智能使用政策对于减轻滥用风险同时充分利用人工智能的益处至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/a5af382b9fe0/701_2025_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/73269d14a191/701_2025_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/bb3947534962/701_2025_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/a5af382b9fe0/701_2025_6622_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/73269d14a191/701_2025_6622_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/bb3947534962/701_2025_6622_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2c1/12331776/1b675f841d4d/701_2025_6622_Fig4_HTML.jpg

相似文献

1
Can we trust academic AI detective? Accuracy and limitations of AI-output detectors.我们能信任学术人工智能侦探吗?人工智能输出检测器的准确性和局限性。
Acta Neurochir (Wien). 2025 Aug 7;167(1):214. doi: 10.1007/s00701-025-06622-4.
2
Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?放射学中人工智能生成的社论:专家编辑能检测出来吗?
AJNR Am J Neuroradiol. 2025 Mar 4;46(3):559-566. doi: 10.3174/ajnr.A8505.
3
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.
4
Defining the Boundaries of AI Use in Scientific Writing: A Comparative Review of Editorial Policies.界定科学写作中人工智能使用的界限:编辑政策的比较综述
J Korean Med Sci. 2025 Jun 16;40(23):e187. doi: 10.3346/jkms.2025.40.e187.
5
Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较:人工智能在医学教育中的应用启示
Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.
6
AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.放射学教育中的人工智能:评估多项选择题的难度和区分度。
J Med Imaging Radiat Sci. 2025 Mar 28;56(4):101896. doi: 10.1016/j.jmir.2025.101896.
7
Using AI to Write a Review Article Examining the Role of the Nervous System on Skeletal Homeostasis and Fracture Healing.利用人工智能撰写一篇综述文章,探讨神经系统在骨骼稳态和骨折愈合中的作用。
Curr Osteoporos Rep. 2024 Feb;22(1):217-221. doi: 10.1007/s11914-023-00854-y. Epub 2024 Jan 13.
8
Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.药学与人工智能相遇:药物信息活动对学生对生成式人工智能认知的影响。
Curr Pharm Teach Learn. 2025 Jul 7;17(10):102439. doi: 10.1016/j.cptl.2025.102439.
9
Figure plagiarism and manipulation, an under-recognised problem in academia.图像抄袭与篡改,学术界一个未得到充分认识的问题。
Eur Radiol. 2025 Aug;35(8):4518-4521. doi: 10.1007/s00330-025-11426-2. Epub 2025 Feb 13.
10
The Ability of ChatGPT in Paraphrasing Texts and Reducing Plagiarism: A Descriptive Analysis.ChatGPT 在文本改写和降低抄袭方面的能力:描述性分析。
JMIR Med Educ. 2024 Jul 8;10:e53308. doi: 10.2196/53308.

本文引用的文献

1
Knowledge, interest and perspectives on Artificial Intelligence in Neurosurgery. A global survey.神经外科领域对人工智能的认知、兴趣及观点:一项全球调查
Brain Spine. 2024 Dec 9;5:104156. doi: 10.1016/j.bas.2024.104156. eCollection 2025.
2
Automatic Segmentation of Vestibular Schwannomas: A Systematic Review.前庭神经鞘瘤的自动分割:系统评价。
World Neurosurg. 2024 Aug;188:35-44. doi: 10.1016/j.wneu.2024.04.145. Epub 2024 Apr 27.
3
AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research.
人工智能与伦理学:对手术研究中使用大语言模型的伦理考量的系统综述
Healthcare (Basel). 2024 Apr 13;12(8):825. doi: 10.3390/healthcare12080825.
4
Evaluation of the safety, accuracy, and helpfulness of the GPT-4.0 Large Language Model in neurosurgery.评估 GPT-4.0 大语言模型在神经外科中的安全性、准确性和有用性。
J Clin Neurosci. 2024 May;123:151-156. doi: 10.1016/j.jocn.2024.03.021. Epub 2024 Apr 4.
5
Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.人工智能能否通过欧洲神经外科书面考试?——伦理与实际问题。
Brain Spine. 2024 Feb 13;4:102765. doi: 10.1016/j.bas.2024.102765. eCollection 2024.
6
Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning's ability to provide diagnoses and treatment plans for example neuro-oncology cases.Chat-GPT 与脑肿瘤:人工智能/机器学习提供神经肿瘤学等案例诊断和治疗方案的能力评估。
Clin Neurol Neurosurg. 2024 Apr;239:108238. doi: 10.1016/j.clineuro.2024.108238. Epub 2024 Mar 9.
7
Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study.区分 ChatGPT 生成和人工撰写的医学文本:定量研究。
JMIR Med Educ. 2023 Dec 28;9:e48904. doi: 10.2196/48904.
8
A Study on Distinguishing ChatGPT-Generated and Human-Written Orthopaedic Abstracts by Reviewers: Decoding the Discrepancies.评审者区分ChatGPT生成和人工撰写的骨科摘要的研究:解读差异
Cureus. 2023 Nov 21;15(11):e49166. doi: 10.7759/cureus.49166. eCollection 2023 Nov.
9
The ChatGPT conundrum: Human-generated scientific manuscripts misidentified as AI creations by AI text detection tool.ChatGPT难题:人工撰写的科学手稿被人工智能文本检测工具误判为人工智能创作。
J Pathol Inform. 2023 Oct 17;14:100342. doi: 10.1016/j.jpi.2023.100342. eCollection 2023.
10
Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams.神经外科考试中的超越人类:ChatGPT 在土耳其神经外科学会专业能力考试中的成功。
Comput Biol Med. 2024 Feb;169:107807. doi: 10.1016/j.compbiomed.2023.107807. Epub 2023 Dec 10.