Sikander Binyamin, Baker Jason J, Deveci Can D, Lund Lars, Rosenberg Jacob
Surgery, Herlev Hospital, Herlev, DNK.
Urology, Odense University Hospital, Odense, DNK.
Cureus. 2023 Nov 18;15(11):e49019. doi: 10.7759/cureus.49019. eCollection 2023 Nov.
Background Natural language processing models are increasingly used in scientific research, and their ability to perform various tasks in the research process is rapidly advancing. This study aims to investigate whether Generative Pre-trained Transformer 4 (GPT-4) is equal to humans in writing introduction sections for scientific articles. Methods This randomized non-inferiority study was reported according to the Consolidated Standards of Reporting Trials for non-inferiority trials and artificial intelligence (AI) guidelines. GPT-4 was instructed to synthesize 18 introduction sections based on the aim of previously published studies, and these sections were compared to the human-written introductions already published in a medical journal. Eight blinded assessors randomly evaluated the introduction sections using 1-10 Likert scales. Results There was no significant difference between GPT-4 and human introductions regarding publishability and content quality. GPT-4 had one point significantly better scores in readability, which was considered a non-relevant difference. The majority of assessors (59%) preferred GPT-4, while 33% preferred human-written introductions. Based on Lix and Flesch-Kincaid scores, GPT-4 introductions were 10 and two points higher, respectively, indicating that the sentences were longer and had longer words. Conclusion GPT-4 was found to be equal to humans in writing introductions regarding publishability, readability, and content quality. The majority of assessors preferred GPT-4 introductions and less than half could determine which were written by GPT-4 or humans. These findings suggest that GPT-4 can be a useful tool for writing introduction sections, and further studies should evaluate its ability to write other parts of scientific articles.
背景 自然语言处理模型在科学研究中的应用日益广泛,其在研究过程中执行各种任务的能力正在迅速提升。本研究旨在调查生成式预训练变换器4(GPT-4)在撰写科学文章引言部分时是否与人类相当。方法 本随机非劣效性研究按照非劣效性试验和人工智能(AI)指南的报告试验综合标准进行报告。GPT-4被要求根据先前发表研究的目的合成18个引言部分,并将这些部分与已发表在医学杂志上的人类撰写的引言进行比较。八位盲法评估者使用1-10李克特量表对引言部分进行随机评估。结果 在可发表性和内容质量方面,GPT-4与人类撰写的引言之间没有显著差异。GPT-4在可读性方面的得分显著高出一分,这被认为是不相关的差异。大多数评估者(59%)更喜欢GPT-4,而33%更喜欢人类撰写的引言。根据利克斯和弗莱施-金凯德分数,GPT-4撰写的引言分别高出10分和2分,表明句子更长且单词更长。结论 研究发现GPT-4在撰写引言的可发表性、可读性和内容质量方面与人类相当。大多数评估者更喜欢GPT-4撰写的引言,不到一半的人能够确定哪些是由GPT-4或人类撰写的。这些发现表明GPT-4可以成为撰写引言部分的有用工具,进一步的研究应评估其撰写科学文章其他部分的能力。