Schneider Daniel M, Mishra Akash, Gluski Jacob, Shah Harshal, Ward Max, Brown Ethan D, Sciubba Daniel M, Lo Sheng-Fu L
Department of Neurosurgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Manhasset, USA.
Cureus. 2025 Feb 16;17(2):e79086. doi: 10.7759/cureus.79086. eCollection 2025 Feb.
INTRODUCTION: With the rapid proliferation of artificial intelligence (AI) tools, important questions about their applicability to manuscript preparation have been raised. This study explores the methodological challenges of detecting AI-generated content in neurosurgical publications, using existing detection tools to highlight both the presence of AI content and the fundamental limitations of current detection approaches. METHODS: We analyzed 100 randomly selected manuscripts published between 2023 and 2024 in high-impact neurosurgery journals using a two-tiered approach to identify potential AI-generated text. The text was classified as AI-generated if both a robustly optimized bidirectional encoder representations from transformers pretraining approach (RoBERTa)-based AI classification tool yielded a positive classification and the text's perplexity score was less than 100. Chi-square tests were conducted to assess differences in the prevalence of AI-generated text across various manuscript sections, topics, and types. In an effort to eliminate bias introduced by the more structured nature of abstracts, a subgroup analysis was conducted that excluded abstracts as well. RESULTS: Approximately one in five (20%) manuscripts contained sections flagged as AI-generated. Abstracts and methods sections were disproportionately identified. After excluding abstracts, the association between section type and AI-generated content was no longer statistically significant. CONCLUSION: Our findings highlight both the increasing integration of AI in manuscript preparation and a critical challenge in academic publishing as AI language models become increasingly sophisticated and traditional detection methods become less reliable. This suggests the need to shift focus from detection to transparency, emphasizing the development of clear disclosure policies and ethical guidelines for AI use in academic writing.
引言:随着人工智能(AI)工具的迅速普及,人们对其在稿件准备中的适用性提出了重要问题。本研究探讨了在神经外科出版物中检测人工智能生成内容的方法挑战,使用现有的检测工具来突出人工智能内容的存在以及当前检测方法的根本局限性。 方法:我们采用两级方法分析了2023年至2024年期间在高影响力神经外科期刊上随机选择的100篇手稿,以识别潜在的人工智能生成文本。如果基于强大优化的基于变换器预训练方法(RoBERTa)的人工智能分类工具给出阳性分类且文本的困惑度得分低于100,则将该文本分类为人工智能生成。进行卡方检验以评估不同手稿章节、主题和类型中人工智能生成文本的流行率差异。为了消除摘要更结构化性质所引入的偏差,还进行了一项排除摘要的亚组分析。 结果:约五分之一(20%)的手稿包含被标记为人工智能生成的章节。摘要和方法部分被过度识别。排除摘要后,章节类型与人工智能生成内容之间的关联不再具有统计学意义。 结论:我们的研究结果突出了人工智能在稿件准备中日益增加的整合以及学术出版中的一个关键挑战,因为人工智能语言模型变得越来越复杂,而传统检测方法变得越来越不可靠。这表明需要将重点从检测转向透明度,强调制定清晰的披露政策和学术写作中使用人工智能的道德准则。
Eur Arch Otorhinolaryngol. 2024-11
Am J Obstet Gynecol. 2024-8
Mymensingh Med J. 2025-4
Cochrane Database Syst Rev. 2022-2-1
Int J Gynecol Cancer. 2024-10-7
J Med Internet Res. 2023-8-31
Nature. 2023-2
Science. 2023-1-27