人机对决：整形外科学术研究中人类与ChatGPT生成摘要的比较研究

Man Versus Machine: A Comparative Study of Human and ChatGPT-Generated Abstracts in Plastic Surgery Research.

作者信息

Pressman Sophia M, Garcia John P, Borna Sahar, Gomez-Cabello Cesar A, Haider Syed Ali, Haider Clifton R, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, 4500 San Pablo Rd, Jacksonville, FL, 32224, USA.

Department of General Surgery, Mayo Clinic, Rochester, MN, USA.

出版信息

Aesthetic Plast Surg. 2025 Apr 14. doi: 10.1007/s00266-025-04836-6.

DOI:10.1007/s00266-025-04836-6

PMID:40229613

Abstract

BACKGROUND

Since its 2022 release, ChatGPT has gained recognition for its potential to expedite time-consuming writing tasks like scientific writing. Well-written scientific abstracts are essential for clear and efficient communication of research findings. This study aims to explore ChatGPT-4's capability to produce well-crafted abstracts.

METHODS

Ten abstract-less plastic surgery articles from PubMed were uploaded to ChatGPT, each with a prompt to generate one abstract. Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES) were calculated for all abstracts. Additionally, three physician evaluators blindly assessed the ten original and ten ChatGPT-generated abstracts using a 5-point Likert scale. Results were compared and analyzed using descriptive statistics with mean and standard deviation (SD).

RESULTS

The original abstracts averaged an FKGL of 14.1 (SD 2.9) and an FRES of 25.2 (SD 14.2), while ChatGPT-generated abstracts had scores of 15.6 (SD 2.4) and 15.4 (SD 13.1), respectively. Collectively, evaluators identified two-thirds of the ChatGPT abstracts, but preferred the ChatGPT abstracts 90% of the time. On average, the evaluators found the ChatGPT abstracts to be more "well written" (4.23 vs. 3.50, p value < 0.001) and "clear and concise" (4.30 vs. 3.53, p value < 0.001) compared to the original abstracts.

CONCLUSIONS

Despite a slightly higher reading level, evaluators generally preferred ChatGPT abstracts, which received higher ratings overall. These findings suggest ChatGPT holds promise in expediting the creation of high-quality scientific abstracts, potentially enhancing efficiency in research and scientific writing tasks. However, due to its exploratory nature, this study calls for additional research to validate these promising findings.

LEVEL OF EVIDENCE IV

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266.

摘要

背景

自2022年发布以来，ChatGPT因其在加快科学写作等耗时写作任务方面的潜力而获得认可。撰写精良的科学摘要对于清晰、高效地交流研究结果至关重要。本研究旨在探索ChatGPT-4生成精心撰写摘要的能力。

方法

从PubMed上选取10篇无摘要的整形外科学术文章上传至ChatGPT，每篇文章都有生成一篇摘要的提示。计算所有摘要的弗莱什-金凯德年级水平（FKGL）和弗莱什阅读简易度得分（FRES）。此外，三名医生评估人员使用5点李克特量表对10篇原文摘要和10篇ChatGPT生成的摘要进行盲评。使用均值和标准差（SD）的描述性统计方法对结果进行比较和分析。

结果

原文摘要的平均FKGL为14.1（标准差2.9），平均FRES为25.2（标准差14.2），而ChatGPT生成的摘要得分分别为15.6（标准差2.4）和15.4（标准差13.1）。总体而言，评估人员识别出了三分之二的ChatGPT生成的摘要，但在90%的情况下更喜欢ChatGPT生成的摘要。平均而言，评估人员发现ChatGPT生成的摘要比原文摘要更“撰写精良”（4.23对3.50，p值<0.001）且“清晰简洁”（4.30对3.53，p值<0.001）。

结论

尽管阅读水平略高，但评估人员总体上更喜欢ChatGPT生成的摘要，这些摘要获得了更高的评分。这些发现表明ChatGPT在加快高质量科学摘要的创作方面具有潜力，可能提高研究和科学写作任务的效率。然而，由于其探索性质，本研究呼吁进行更多研究以验证这些有前景的发现。

证据水平IV：本期刊要求作者为每篇文章指定证据水平。有关这些循证医学评级的完整描述，请参阅目录或作者在线指南 www.springer.com/00266。

相似文献

Man Versus Machine: A Comparative Study of Human and ChatGPT-Generated Abstracts in Plastic Surgery Research.

Aesthetic Plast Surg. 2025 Apr 14. doi: 10.1007/s00266-025-04836-6.

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.

Am J Obstet Gynecol. 2024 Aug;231(2):276.e1-276.e10. doi: 10.1016/j.ajog.2024.04.045. Epub 2024 May 6.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.

J Med Internet Res. 2024 Jun 26;26:e52001. doi: 10.2196/52001.

Evaluating the Efficacy of Large Language Models in Generating Medical Documentation: A Comparative Study of ChatGPT-4, ChatGPT-4o, and Claude.

Aesthetic Plast Surg. 2025 Apr 14. doi: 10.1007/s00266-025-04842-8.

Can ChatGPT be the Plastic Surgeon's New Digital Assistant? A Bibliometric Analysis and Scoping Review of ChatGPT in Plastic Surgery Literature.

Aesthetic Plast Surg. 2024 Apr;48(8):1644-1652. doi: 10.1007/s00266-023-03709-0. Epub 2023 Oct 18.

Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries.

Urol Pract. 2023 Sep;10(5):436-443. doi: 10.1097/UPJ.0000000000000428. Epub 2023 Jul 5.

Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study.

J Med Internet Res. 2023 Dec 25;25:e51229. doi: 10.2196/51229.

Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients.

J Neurosurg. 2024 Nov 8;142(4):1189-1195. doi: 10.3171/2024.6.JNS24617. Print 2025 Apr 1.

Exploring the Potential of ChatGPT-4 in Responding to Common Questions About Abdominoplasty: An AI-Based Case Study of a Plastic Surgery Consultation.

Aesthetic Plast Surg. 2024 Apr;48(8):1571-1583. doi: 10.1007/s00266-023-03660-0. Epub 2023 Sep 28.

本文引用的文献

Perceptions and detection of AI use in manuscript preparation for academic journals.

PLoS One. 2024 Jul 12;19(7):e0304807. doi: 10.1371/journal.pone.0304807. eCollection 2024.

Clinical and Surgical Applications of Large Language Models: A Systematic Review.

J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.

AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research.

Healthcare (Basel). 2024 Apr 13;12(8):825. doi: 10.3390/healthcare12080825.

The Influence of the Reviewer in Academic Studies: May the Paper Make or Break?

J Craniofac Surg. 2024;35(1):278. doi: 10.1097/SCS.0000000000009907. Epub 2023 Nov 20.

Ethical consideration of the use of generative artificial intelligence, including ChatGPT in writing a nursing article.

Child Health Nurs Res. 2023 Oct;29(4):249-251. doi: 10.4094/chnr.2023.29.4.249. Epub 2023 Oct 31.

The importance of figures in scientific 'show and tell'.

Dis Model Mech. 2023 Nov 1;16(11). doi: 10.1242/dmm.050545. Epub 2023 Nov 8.

To ChatGPT or not to ChatGPT: the use of artificial intelligence in writing scientific papers.

Brain Commun. 2023 Nov 1;5(6):fcad266. doi: 10.1093/braincomms/fcad266. eCollection 2023.

Scientists used ChatGPT to generate an entire paper from scratch - but is it any good?

Nature. 2023 Jul;619(7970):443-444. doi: 10.1038/d41586-023-02218-z.

Writing with ChatGPT: An Illustration of its Capacity, Limitations & Implications for Academic Writers.

Perspect Med Educ. 2023 Jun 29;12(1):261-270. doi: 10.5334/pme.1072. eCollection 2023.

Could ChatGPT help you to write your next scientific paper?: concerns on research ethics related to usage of artificial intelligence tools.

J Korean Assoc Oral Maxillofac Surg. 2023 Jun 30;49(3):105-106. doi: 10.5125/jkaoms.2023.49.3.105.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人机对决：整形外科学术研究中人类与ChatGPT生成摘要的比较研究

Man Versus Machine: A Comparative Study of Human and ChatGPT-Generated Abstracts in Plastic Surgery Research.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

LEVEL OF EVIDENCE IV

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献