Département universitaire de chirurgie orthopédique, université de Lille, CHU de Lille, 59000 Lille, France; Service de chirurgie orthopédique, centre hospitalier universitaire (CHU) de Lille, hôpital Roger-Salengro, place de Verdun, 59000 Lille, France.
Service de chirurgie du membre supérieur, Hautepierre 2, CHRU Strasbourg, 1, avenue Molière, 67200 Strasbourg, France.
Orthop Traumatol Surg Res. 2023 Dec;109(8):103694. doi: 10.1016/j.otsr.2023.103694. Epub 2023 Sep 29.
The use of artificial intelligence (AI) is soaring, and the launch of ChatGPT in November 2022 has accelerated this trend. This "chatbot" can generate complete scientific articles, with risk of plagiarism by mining existing data or downright fraud by fabricating studies with no real data at all. There are tools that detect AI in publications, but to our knowledge they have not been systematically assessed for publication in scientific journals. We therefore conducted a retrospective study on articles published in Orthopaedics & Traumatology: Surgery & Research (OTSR): firstly, to screen for AI-generated content before and after the publicized launch of ChatGPT; secondly, to assess whether AI was more often used in some countries than others to generate content; thirdly, to determine whether plagiarism rate correlated with AI-generation, and lastly, to determine whether elements other than text generation, and notably the translation procedure, could raise suspicion of AI use.
The rate of AI use increased after the publicized launch of ChatGPT v3.5 in November 2022.
In all, 425 articles published between February 2022 and September 2023 (221 before and 204 after November 1, 2022) underwent ZeroGPT assessment of the level of AI generation in the final English-language version (abstract and body of the article). Two scores were obtained: probability of AI generation, in six grades from Human to AI; and percentage AI generation. Plagiarism was assessed on the Ithenticate application at submission. Articles in French were assessed in their English-language version as translated by a human translator, with comparison to automatic translation by Google Translate and DeepL.
AI-generated text was detected mainly in Abstracts, with a 10.1% rate of AI or considerable AI generation, compared to only 1.9% for the body of the article and 5.6% for the total body+abstract. Analysis for before and after November 2022 found an increase in AI generation in body+abstract, from 10.30±15.95% (range, 0-100%) to 15.64±19.8% (range, 0-99.93) (p < 0.04; NS for abstracts alone). AI scores differed between types of article: 14.9% for original articles and 9.8% for reviews (p<0.01). The highest rates of probable AI generation were in articles from Japan, China, South America and English-speaking countries (p<0.0001). Plagiarism rates did not increase between the two study periods, and were unrelated to AI rates. On the other hand, when articles were classified as "suspected" of AI generation (plagiarism rate ≥ 20%) or "non-suspected" (rate<20%), the "similarity" score was higher in suspect articles: 25.7±13.23% (range, 10-69%) versus 16.28±10% (range, 0-79%) (p < 0.001). In the body of the article, use of translation software was associated with higher AI rates than with a human translator: 3.5±5% for human translators, versus 18±10% and 21.9±11% respectively for Google Translate and DeepL (p < 0.001).
The present study revealed an increasing rate of AI use in articles published in OTSR. AI grades differed according to type of article and country of origin. Use of translation software increased the AI grade. In the long run, use of ChatGPT incurs a risk of plagiarism and scientific misconduct, and needs to be detected and signaled by a digital tag on any robot-generated text.
III; case-control study.
人工智能(AI)的使用正在飙升,2022 年 11 月发布的 ChatGPT 加速了这一趋势。这个“聊天机器人”可以生成完整的科学文章,存在从现有数据中挖掘内容以进行剽窃的风险,或者通过根本没有真实数据来编造研究内容的完全造假风险。有一些工具可以检测出版物中的 AI,但据我们所知,它们尚未针对在科学期刊上发表的情况进行系统评估。因此,我们对《矫形创伤外科与研究杂志》(OTSR)发表的文章进行了回顾性研究:首先,在 ChatGPT 公开发布之前和之后筛选出 AI 生成的内容;其次,评估在某些国家比其他国家更频繁地使用 AI 生成内容;第三,确定剽窃率是否与 AI 生成相关,最后,确定除文本生成以外的其他因素,特别是翻译程序,是否会引起对 AI 使用的怀疑。
在 2022 年 11 月公开发布 ChatGPT v3.5 之后,AI 的使用量增加。
共评估了 425 篇文章,发表时间为 2022 年 2 月至 2023 年 9 月(221 篇在 2022 年 11 月 1 日之前,204 篇在之后),使用 ZeroGPT 评估最终英语版本(摘要和文章主体)的 AI 生成水平。获得了两个分数:AI 生成的概率,从人类到 AI 的六个等级;以及 AI 生成的百分比。剽窃在提交时使用 Ithenticate 进行评估。法语文章以其英语版本进行评估,由人工翻译翻译,与 Google Translate 和 DeepL 的自动翻译进行比较。
AI 生成的文本主要出现在摘要中,AI 生成或相当程度的 AI 生成的比例为 10.1%,而文章主体的比例仅为 1.9%,总主体+摘要的比例为 5.6%。对 2022 年 11 月前后的分析发现,主体+摘要中的 AI 生成增加,从 10.30±15.95%(范围,0-100%)增加到 15.64±19.8%(范围,0-99.93)(p<0.04;摘要单独的 NS)。AI 分数因文章类型而异:原始文章为 14.9%,综述为 9.8%(p<0.01)。可能存在 AI 生成的最高比例的文章来自日本、中国、南美洲和英语国家(p<0.0001)。在两个研究期间,剽窃率没有增加,与 AI 率无关。另一方面,当文章被归类为“疑似”AI 生成(剽窃率≥20%)或“非疑似”(率<20%)时,可疑文章的“相似性”评分更高:25.7±13.23%(范围,10-69%)与 16.28±10%(范围,0-79%)(p<0.001)。在文章主体中,使用翻译软件与使用人工翻译相比,AI 率更高:人工翻译为 3.5±5%,而 Google Translate 和 DeepL 分别为 18±10%和 21.9±11%(p<0.001)。
本研究揭示了 OTSR 发表的文章中 AI 使用率的增加。AI 等级根据文章类型和来源国而有所不同。使用翻译软件会增加 AI 等级。从长远来看,使用 ChatGPT 存在剽窃和科学不当行为的风险,需要通过任何机器人生成文本上的数字标签来检测和发出信号。
III;病例对照研究。