Suppr超能文献

人机对决:整形外科学术研究中人类与ChatGPT生成摘要的比较研究

Man Versus Machine: A Comparative Study of Human and ChatGPT-Generated Abstracts in Plastic Surgery Research.

作者信息

Pressman Sophia M, Garcia John P, Borna Sahar, Gomez-Cabello Cesar A, Haider Syed Ali, Haider Clifton R, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL, 32224, USA.

Department of General Surgery, Mayo Clinic, Rochester, MN, USA.

出版信息

Aesthetic Plast Surg. 2025 Apr 14. doi: 10.1007/s00266-025-04836-6.

Abstract

BACKGROUND

Since its 2022 release, ChatGPT has gained recognition for its potential to expedite time-consuming writing tasks like scientific writing. Well-written scientific abstracts are essential for clear and efficient communication of research findings. This study aims to explore ChatGPT-4's capability to produce well-crafted abstracts.

METHODS

Ten abstract-less plastic surgery articles from PubMed were uploaded to ChatGPT, each with a prompt to generate one abstract. Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES) were calculated for all abstracts. Additionally, three physician evaluators blindly assessed the ten original and ten ChatGPT-generated abstracts using a 5-point Likert scale. Results were compared and analyzed using descriptive statistics with mean and standard deviation (SD).

RESULTS

The original abstracts averaged an FKGL of 14.1 (SD 2.9) and an FRES of 25.2 (SD 14.2), while ChatGPT-generated abstracts had scores of 15.6 (SD 2.4) and 15.4 (SD 13.1), respectively. Collectively, evaluators identified two-thirds of the ChatGPT abstracts, but preferred the ChatGPT abstracts 90% of the time. On average, the evaluators found the ChatGPT abstracts to be more "well written" (4.23 vs. 3.50, p value < 0.001) and "clear and concise" (4.30 vs. 3.53, p value < 0.001) compared to the original abstracts.

CONCLUSIONS

Despite a slightly higher reading level, evaluators generally preferred ChatGPT abstracts, which received higher ratings overall. These findings suggest ChatGPT holds promise in expediting the creation of high-quality scientific abstracts, potentially enhancing efficiency in research and scientific writing tasks. However, due to its exploratory nature, this study calls for additional research to validate these promising findings.

LEVEL OF EVIDENCE IV

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors   www.springer.com/00266.

摘要

背景

自2022年发布以来,ChatGPT因其在加快科学写作等耗时写作任务方面的潜力而获得认可。撰写精良的科学摘要对于清晰、高效地交流研究结果至关重要。本研究旨在探索ChatGPT-4生成精心撰写摘要的能力。

方法

从PubMed上选取10篇无摘要的整形外科学术文章上传至ChatGPT,每篇文章都有生成一篇摘要的提示。计算所有摘要的弗莱什-金凯德年级水平(FKGL)和弗莱什阅读简易度得分(FRES)。此外,三名医生评估人员使用5点李克特量表对10篇原文摘要和10篇ChatGPT生成的摘要进行盲评。使用均值和标准差(SD)的描述性统计方法对结果进行比较和分析。

结果

原文摘要的平均FKGL为14.1(标准差2.9),平均FRES为25.2(标准差14.2),而ChatGPT生成的摘要得分分别为15.6(标准差2.4)和15.4(标准差13.1)。总体而言,评估人员识别出了三分之二的ChatGPT生成的摘要,但在90%的情况下更喜欢ChatGPT生成的摘要。平均而言,评估人员发现ChatGPT生成的摘要比原文摘要更“撰写精良”(4.23对3.50,p值<0.001)且“清晰简洁”(4.30对3.53,p值<0.001)。

结论

尽管阅读水平略高,但评估人员总体上更喜欢ChatGPT生成的摘要,这些摘要获得了更高的评分。这些发现表明ChatGPT在加快高质量科学摘要的创作方面具有潜力,可能提高研究和科学写作任务的效率。然而,由于其探索性质,本研究呼吁进行更多研究以验证这些有前景的发现。

证据水平IV:本期刊要求作者为每篇文章指定证据水平。有关这些循证医学评级的完整描述,请参阅目录或作者在线指南 www.springer.com/00266。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验