Lo Vecchio Nicholas
Independent researcher, Marseille, France.
Res Integr Peer Rev. 2025 Apr 7;10(1):4. doi: 10.1186/s41073-025-00161-3.
While some recent studies have looked at large language model (LLM) use in peer review at the corpus level, to date there have been few examinations of instances of AI-generated reviews in their social context. The goal of this first-person account is to present my experience of receiving two anonymous peer review reports that I believe were produced using generative AI, as well as lessons learned from that experience.
This is a case report on the timeline of the incident, and my and the journal's actions following it. Supporting evidence includes text patterns in the reports, online AI detection tools and ChatGPT simulations; recommendations are offered for others who may find themselves in a similar situation. The primary research limitation of this article is that it is based on one individual's personal experience.
After alleging the use of generative AI in December 2023, two months of back-and-forth ensued between myself and the journal, leading to my withdrawal of the submission. The journal denied any ethical breach, without taking an explicit position on the allegations of LLM use. Based on this experience, I recommend that authors engage in dialogue with journals on AI use in peer review prior to article submission; where undisclosed AI use is suspected, authors should proactively amass evidence, request an investigation protocol, escalate the matter as needed, involve independent bodies where possible, and share their experience with fellow researchers.
Journals need to promptly adopt transparent policies on LLM use in peer review, in particular requiring disclosure. Open peer review where identities of all stakeholders are declared might safeguard against LLM misuse, but accountability in the AI era is needed from all parties.
虽然最近一些研究在语料库层面探讨了大语言模型(LLM)在同行评审中的应用,但迄今为止,很少有人在社会背景下考察人工智能生成的评审实例。这篇第一人称叙述的目的是分享我收到两份我认为是使用生成式人工智能生成的匿名同行评审报告的经历,以及从该经历中吸取的教训。
这是一篇关于该事件时间线以及我和期刊后续行动的案例报告。支持证据包括报告中的文本模式、在线人工智能检测工具和ChatGPT模拟;为可能遇到类似情况的其他人提供了建议。本文的主要研究局限性在于它基于个人的亲身经历。
在2023年12月指控使用生成式人工智能后,我和期刊之间进行了两个月的反复沟通,最终我撤回了投稿。期刊否认存在任何道德违规行为,但未就使用大语言模型的指控明确表态。基于这一经历,我建议作者在提交文章之前就同行评审中人工智能的使用与期刊进行对话;如果怀疑存在未披露的人工智能使用情况,作者应积极收集证据、要求调查方案、视需要升级此事、尽可能让独立机构参与,并与其他研究人员分享自己的经历。
期刊需要迅速采用关于同行评审中使用大语言模型的透明政策,特别是要求进行披露。所有利益相关者身份都公开的开放同行评审可能会防止大语言模型被滥用,但人工智能时代各方都需要承担责任。