• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能与学术界:关于人工智能文本检测器在行为健康学术写作中准确性的实验研究

AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing.

作者信息

Popkov Andrey A, Barrett Tyson S

机构信息

Highmark Health, Pittsburgh, PA, USA.

Contigo Health, LLC, a subsidiary of Premier, Inc, Charlotte, NC, USA.

出版信息

Account Res. 2024 Mar 22:1-17. doi: 10.1080/08989621.2024.2331757.

DOI:10.1080/08989621.2024.2331757
PMID:38516933
Abstract

Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by "ChatGPT" and 100 by "Claude"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.

摘要

人工智能(AI)语言模型在可及性和功能方面都在不断扩展。随着这些模型的发展,医学和医疗保健领域中探讨人工智能生成文本相关政策的学术期刊数量有所增加。此类政策的实施需要准确的人工智能检测工具。不准确的检测器可能会给人类作者带来不必要的惩罚,和/或可能会影响针对人工智能生成内容的指南的有效执行。然而,已发现人工智能文本检测工具在识别人类撰写内容与人工智能生成内容方面的准确性在已发表的研究中各不相同。这项实验研究以行为健康出版物为样本,发现免费和付费的人工智能检测工具都存在有问题的误报率和漏报率。该研究评估了2016年至2018年行为健康和精神病学期刊上的100篇研究文章以及人工智能聊天机器人生成的200篇文本(“ChatGPT”生成100篇,“Claude”生成100篇)。免费的人工智能检测器显示,被判定为人工智能生成的学术文本比例中位数为27.2%,而商业软件Originality.AI表现更好,但仍有局限性,尤其是在检测Claude生成的文本方面。这些错误率让人怀疑依靠人工智能检测器来执行行为健康出版物中关于人工智能文本生成的严格政策是否可行。

相似文献

1
AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing.人工智能与学术界:关于人工智能文本检测器在行为健康学术写作中准确性的实验研究
Account Res. 2024 Mar 22:1-17. doi: 10.1080/08989621.2024.2331757.
2
Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.人机之争:在妇科和泌尿外科学中识别 ChatGPT 生成的摘要。
Am J Obstet Gynecol. 2024 Aug;231(2):276.e1-276.e10. doi: 10.1016/j.ajog.2024.04.045. Epub 2024 May 6.
3
Between human and AI: assessing the reliability of AI text detection tools.在人与 AI 之间:评估 AI 文本检测工具的可靠性。
Curr Med Res Opin. 2024 Mar;40(3):353-358. doi: 10.1080/03007995.2024.2310086. Epub 2024 Feb 2.
4
Detecting generative artificial intelligence in scientific articles: Evasion techniques and implications for scientific integrity.检测科学文章中的生成式人工智能:规避技术及其对科学诚信的影响。
Orthop Traumatol Surg Res. 2023 Dec;109(8):103706. doi: 10.1016/j.otsr.2023.103706. Epub 2023 Oct 12.
5
Performance of Artificial Intelligence Content Detectors Using Human and Artificial Intelligence-Generated Scientific Writing.使用人类和人工智能生成的科学写作来评估人工智能内容检测器的性能。
Ann Surg Oncol. 2024 Oct;31(10):6387-6393. doi: 10.1245/s10434-024-15549-6. Epub 2024 Jun 22.
6
What is the rate of text generated by artificial intelligence over a year of publication in Orthopedics & Traumatology: Surgery & Research? Analysis of 425 articles before versus after the launch of ChatGPT in November 2022.在《矫形外科与创伤学:手术与研究》杂志上发表的人工智能文本在一年时间内的生成率是多少?分析 2022 年 11 月 ChatGPT 发布前后的 425 篇文章。
Orthop Traumatol Surg Res. 2023 Dec;109(8):103694. doi: 10.1016/j.otsr.2023.103694. Epub 2023 Sep 29.
7
Perceptions and detection of AI use in manuscript preparation for academic journals.学术期刊稿件准备中对人工智能使用的认知与检测。
PLoS One. 2024 Jul 12;19(7):e0304807. doi: 10.1371/journal.pone.0304807. eCollection 2024.
8
AI language models in human reproduction research: exploring ChatGPT's potential to assist academic writing.人工智能语言模型在人类生殖研究中的应用:探索 ChatGPT 在辅助学术写作方面的潜力。
Hum Reprod. 2023 Dec 4;38(12):2281-2288. doi: 10.1093/humrep/dead207.
9
How much can we rely on artificial intelligence chatbots such as the ChatGPT software program to assist with scientific writing?我们能在多大程度上依靠诸如ChatGPT软件程序这样的人工智能聊天机器人来辅助科学写作?
J Prosthet Dent. 2025 Apr;133(4):1082-1088. doi: 10.1016/j.prosdent.2023.05.023. Epub 2023 Jul 10.
10
AI-generated text in otolaryngology publications: a comparative analysis before and after the release of ChatGPT.耳鼻喉科出版物中的人工智能生成文本:ChatGPT 发布前后的对比分析。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6141-6146. doi: 10.1007/s00405-024-08834-3. Epub 2024 Jul 17.

引用本文的文献

1
ChatGPT in Academic Writing: A Scientometric Analysis of Literature Published Between 2022 and 2023.学术写作中的ChatGPT:对2022年至2023年发表文献的科学计量分析
J Empir Res Hum Res Ethics. 2025 Jul;20(3):131-148. doi: 10.1177/15562646251350203. Epub 2025 Jun 22.
2
AI detectors are poor western blot classifiers: a study of accuracy and predictive values.人工智能检测工具在蛋白质印迹法分类方面表现不佳:准确性和预测价值研究
PeerJ. 2025 Feb 20;13:e18988. doi: 10.7717/peerj.18988. eCollection 2025.
3
Gotcha GPT: Ensuring the Integrity in Academic Writing.
拿捏 GPT:确保学术写作的诚信。
J Chem Inf Model. 2024 Nov 11;64(21):8091-8097. doi: 10.1021/acs.jcim.4c01203. Epub 2024 Oct 22.