文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。

ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.

作者信息

Ágústsdóttir Dagný Halla, Rosenberg Jacob, Baker Jason Joe

机构信息

Center for Perioperative Optimization, Department of Surgery, Herlev and Gentofte Hospital University of Copenhagen Copenhagen Denmark.

Cochrane Colorectal Group Herlev and Gentofte Hospital Herlev Denmark.

出版信息

Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.


DOI:10.1002/cesm.70037
PMID:40727555
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12302524/
Abstract

INTRODUCTION: Plain language summaries in Cochrane reviews are designed to present key information in a way that is understandable to individuals without a medical background. Despite Cochrane's author guidelines, these summaries often fail to achieve their intended purpose. Studies show that they are generally difficult to read and vary in their adherence to the guidelines. Artificial intelligence is increasingly used in medicine and academia, with its potential being tested in various roles. This study aimed to investigate whether ChatGPT-4o could produce plain language summaries that are as good as the already published plain language summaries in Cochrane reviews. METHODS: We conducted a randomized, single-blinded study with a total of 36 plain language summaries: 18 human written and 18 ChatGPT-4o generated summaries where both versions were for the same Cochrane reviews. The sample size was calculated to be 36 and each summary was evaluated four times. Each summary was reviewed twice by members of a Cochrane editorial group and twice by laypersons. The summaries were assessed in three different ways: First, all assessors evaluated the summaries for informativeness, readability, and level of detail using a Likert scale from 1 to 10. They were also asked whether they would submit the summary and whether they could identify who had written it. Second, members of a Cochrane editorial group assessed the summaries using a checklist based on Cochrane's guidelines for plain language summaries, with scores ranging from 0 to 10. Finally, the readability of the summaries was analyzed using objective tools such as Lix and Flesch-Kincaid scores. Randomization and allocation to either ChatGPT-4o or human written summaries were conducted using random.org's random sequence generator, and assessors were blinded to the authorship of the summaries. RESULTS: The plain language summaries generated by ChatGPT-4o scored 1 point higher on information ( < .001) and level of detail ( = .004), and 2 points higher on readability ( = .002) compared to human written summaries. Lix and Flesch-Kincaid scores were high for both groups of summaries, though ChatGPT was slightly easier to read ( < .001). Assessors found it difficult to distinguish between ChatGPT and human written summaries, with only 20% correctly identifying ChatGPT generated text. ChatGPT summaries were preferred for submission compared to the human written summaries (64% vs. 36%,  < .001). CONCLUSION: ChatGPT-4o shows promise in creating plain language summaries for Cochrane reviews at least as well as humans and in some cases slightly better. This study suggests ChatGPT-4o's could become a tool for drafting easy-to-understand plain language summaries for Cochrane reviews with a quality approaching or matching human authors. CLINICAL TRIAL REGISTRATION AND PROTOCOL: Available at https://osf.io/aq6r5.

摘要

引言:Cochrane系统评价中的简明语言摘要旨在以非医学背景人士能够理解的方式呈现关键信息。尽管有Cochrane的作者指南,但这些摘要往往未能达到预期目的。研究表明,它们通常难以阅读,并且在遵循指南方面存在差异。人工智能在医学和学术界的应用越来越广泛,其潜力正在各个角色中得到检验。本研究旨在调查ChatGPT-4o能否生成与Cochrane系统评价中已发表的简明语言摘要质量相当的内容。 方法:我们进行了一项随机、单盲研究,共有36篇简明语言摘要:18篇由人工撰写,18篇由ChatGPT-4o生成,两种版本均针对相同的Cochrane系统评价。样本量计算为36,每篇摘要评估四次。每篇摘要由Cochrane编辑小组的成员评审两次,由外行人评审两次。摘要通过三种不同方式进行评估:首先,所有评估者使用1至10的李克特量表对摘要的信息性、可读性和详细程度进行评估。他们还被问及是否会提交该摘要以及是否能识别出撰写者。其次,Cochrane编辑小组的成员根据Cochrane简明语言摘要指南使用清单对摘要进行评估,分数范围为0至10。最后,使用客观工具如Lix和弗莱什-金凯德分数分析摘要的可读性。使用random.org的随机序列生成器进行随机化并分配到ChatGPT-4o或人工撰写的摘要,评估者对摘要的作者身份不知情。 结果:与人工撰写的摘要相比,ChatGPT-4o生成的简明语言摘要在信息性(P<0.001)和详细程度(P = 0.004)方面得分高1分,在可读性方面得分高2分(P = 0.002)。两组摘要的Lix和弗莱什-金凯德分数都很高,不过ChatGPT生成的摘要略易读(P<0.001)。评估者发现很难区分ChatGPT生成的摘要和人工撰写的摘要,只有20%的人能正确识别出ChatGPT生成的文本。与人工撰写的摘要相比,ChatGPT生成的摘要更倾向于被提交(64%对36%,P<0.001)。 结论:ChatGPT-4o在为Cochrane系统评价创建简明语言摘要方面显示出前景,至少与人类表现相当,在某些情况下甚至略胜一筹。本研究表明,ChatGPT-4o可以成为为Cochrane系统评价起草易于理解的简明语言摘要的工具,其质量接近或匹配人类作者。 临床试验注册与方案:可在https://osf.io/aq6r5获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/ab09629a7340/CESM-3-e70037-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/7843129f414e/CESM-3-e70037-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/e5f9b4ad99bf/CESM-3-e70037-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/ab09629a7340/CESM-3-e70037-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/7843129f414e/CESM-3-e70037-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/e5f9b4ad99bf/CESM-3-e70037-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c3/12302524/ab09629a7340/CESM-3-e70037-g002.jpg

相似文献

[1]
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.

Cochrane Evid Synth Methods. 2025-7-28

[2]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[3]
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.

Health Technol Assess. 2001

[4]
Eliciting adverse effects data from participants in clinical trials.

Cochrane Database Syst Rev. 2018-1-16

[5]
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024-11-26

[6]
Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001

[7]
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021-4-19

[8]
The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.

Health Technol Assess. 2025-6-25

[9]
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001

[10]
Interventions for promoting habitual exercise in people living with and beyond cancer.

Cochrane Database Syst Rev. 2018-9-19

本文引用的文献

[1]
The use of a large language model to create plain language summaries of evidence reviews in healthcare: A feasibility study.

Cochrane Evid Synth Methods. 2024-2-4

[2]
Unity in Diversity: Collaborative Pre-training Across Multimodal Medical Sources.

Proc Conf Assoc Comput Linguist Meet. 2024-8

[3]
Assessing AI Simplification of Medical Texts: Readability and Content Fidelity.

Int J Med Inform. 2025-3

[4]
Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success).

Proc Conf Assoc Comput Linguist Meet. 2023-7

[5]
Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.

J Neurosurg. 2024-11-8

[6]
Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023.

JCO Clin Cancer Inform. 2024-5

[7]
Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.

Am J Obstet Gynecol. 2024-8

[8]
Can ChatGPT assist authors with abstract writing in medical journals? Evaluating the quality of scientific abstracts generated by ChatGPT and original abstracts.

PLoS One. 2024

[9]
ChatGPT-4 and Human Researchers Are Equal in Writing Scientific Introduction Sections: A Blinded, Randomized, Non-inferiority Controlled Study.

Cureus. 2023-11-18

[10]
A comparison of cover letters written by ChatGPT-4 or humans.

Dan Med J. 2023-11-23

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索