Ágústsdóttir Dagný Halla, Rosenberg Jacob, Baker Jason Joe
Center for Perioperative Optimization, Department of Surgery, Herlev and Gentofte Hospital University of Copenhagen Copenhagen Denmark.
Cochrane Colorectal Group Herlev and Gentofte Hospital Herlev Denmark.
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.
INTRODUCTION: Plain language summaries in Cochrane reviews are designed to present key information in a way that is understandable to individuals without a medical background. Despite Cochrane's author guidelines, these summaries often fail to achieve their intended purpose. Studies show that they are generally difficult to read and vary in their adherence to the guidelines. Artificial intelligence is increasingly used in medicine and academia, with its potential being tested in various roles. This study aimed to investigate whether ChatGPT-4o could produce plain language summaries that are as good as the already published plain language summaries in Cochrane reviews. METHODS: We conducted a randomized, single-blinded study with a total of 36 plain language summaries: 18 human written and 18 ChatGPT-4o generated summaries where both versions were for the same Cochrane reviews. The sample size was calculated to be 36 and each summary was evaluated four times. Each summary was reviewed twice by members of a Cochrane editorial group and twice by laypersons. The summaries were assessed in three different ways: First, all assessors evaluated the summaries for informativeness, readability, and level of detail using a Likert scale from 1 to 10. They were also asked whether they would submit the summary and whether they could identify who had written it. Second, members of a Cochrane editorial group assessed the summaries using a checklist based on Cochrane's guidelines for plain language summaries, with scores ranging from 0 to 10. Finally, the readability of the summaries was analyzed using objective tools such as Lix and Flesch-Kincaid scores. Randomization and allocation to either ChatGPT-4o or human written summaries were conducted using random.org's random sequence generator, and assessors were blinded to the authorship of the summaries. RESULTS: The plain language summaries generated by ChatGPT-4o scored 1 point higher on information ( < .001) and level of detail ( = .004), and 2 points higher on readability ( = .002) compared to human written summaries. Lix and Flesch-Kincaid scores were high for both groups of summaries, though ChatGPT was slightly easier to read ( < .001). Assessors found it difficult to distinguish between ChatGPT and human written summaries, with only 20% correctly identifying ChatGPT generated text. ChatGPT summaries were preferred for submission compared to the human written summaries (64% vs. 36%, < .001). CONCLUSION: ChatGPT-4o shows promise in creating plain language summaries for Cochrane reviews at least as well as humans and in some cases slightly better. This study suggests ChatGPT-4o's could become a tool for drafting easy-to-understand plain language summaries for Cochrane reviews with a quality approaching or matching human authors. CLINICAL TRIAL REGISTRATION AND PROTOCOL: Available at https://osf.io/aq6r5.
引言:Cochrane系统评价中的简明语言摘要旨在以非医学背景人士能够理解的方式呈现关键信息。尽管有Cochrane的作者指南,但这些摘要往往未能达到预期目的。研究表明,它们通常难以阅读,并且在遵循指南方面存在差异。人工智能在医学和学术界的应用越来越广泛,其潜力正在各个角色中得到检验。本研究旨在调查ChatGPT-4o能否生成与Cochrane系统评价中已发表的简明语言摘要质量相当的内容。 方法:我们进行了一项随机、单盲研究,共有36篇简明语言摘要:18篇由人工撰写,18篇由ChatGPT-4o生成,两种版本均针对相同的Cochrane系统评价。样本量计算为36,每篇摘要评估四次。每篇摘要由Cochrane编辑小组的成员评审两次,由外行人评审两次。摘要通过三种不同方式进行评估:首先,所有评估者使用1至10的李克特量表对摘要的信息性、可读性和详细程度进行评估。他们还被问及是否会提交该摘要以及是否能识别出撰写者。其次,Cochrane编辑小组的成员根据Cochrane简明语言摘要指南使用清单对摘要进行评估,分数范围为0至10。最后,使用客观工具如Lix和弗莱什-金凯德分数分析摘要的可读性。使用random.org的随机序列生成器进行随机化并分配到ChatGPT-4o或人工撰写的摘要,评估者对摘要的作者身份不知情。 结果:与人工撰写的摘要相比,ChatGPT-4o生成的简明语言摘要在信息性(P<0.001)和详细程度(P = 0.004)方面得分高1分,在可读性方面得分高2分(P = 0.002)。两组摘要的Lix和弗莱什-金凯德分数都很高,不过ChatGPT生成的摘要略易读(P<0.001)。评估者发现很难区分ChatGPT生成的摘要和人工撰写的摘要,只有20%的人能正确识别出ChatGPT生成的文本。与人工撰写的摘要相比,ChatGPT生成的摘要更倾向于被提交(64%对36%,P<0.001)。 结论:ChatGPT-4o在为Cochrane系统评价创建简明语言摘要方面显示出前景,至少与人类表现相当,在某些情况下甚至略胜一筹。本研究表明,ChatGPT-4o可以成为为Cochrane系统评价起草易于理解的简明语言摘要的工具,其质量接近或匹配人类作者。 临床试验注册与方案:可在https://osf.io/aq6r5获取。
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2018-1-16
Health Technol Assess. 2001
Cochrane Database Syst Rev. 2021-4-19
Cochrane Database Syst Rev. 2018-9-19
Cochrane Evid Synth Methods. 2024-2-4
Proc Conf Assoc Comput Linguist Meet. 2024-8
Int J Med Inform. 2025-3
Proc Conf Assoc Comput Linguist Meet. 2023-7
Am J Obstet Gynecol. 2024-8
Dan Med J. 2023-11-23