Huespe Ivan A, Echeverri Jorge, Khalid Aisha, Carboni Bisso Indalecio, Musso Carlos G, Surani Salim, Bansal Vikas, Kashyap Rahul
Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.
Universidad de Buenos Aires, Buenos Aires, Argentina.
Crit Care Explor. 2023 Oct 2;5(10):e0975. doi: 10.1097/CCE.0000000000000975. eCollection 2023 Oct.
The scientific community debates Generative Pre-trained Transformer (GPT)-3.5's article quality, authorship merit, originality, and ethical use in scientific writing.
Assess GPT-3.5's ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13.
Observational cross-sectional study.
Researchers from 20 countries from six continents evaluated the backgrounds.
Researchers with a Scopus index greater than 1 were included.
In this study, we generated a background section of a critical care clinical research question on "acute kidney injury in sepsis" using three different methods: researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools.
A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1-7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively.
GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.
科学界对生成式预训练变换器(GPT)-3.5在科学写作中的文章质量、作者资质、原创性和道德使用存在争议。
与H指数分别为22和13的医学研究人员相比,评估GPT-3.5撰写重症监护临床研究问题背景部分的能力。
观察性横断面研究。
来自六大洲20个国家的研究人员对背景进行了评估。
纳入了Scopus指数大于1的研究人员。
在本研究中,我们使用三种不同方法生成了关于“脓毒症中的急性肾损伤”这一重症监护临床研究问题的背景部分:H指数大于20的研究人员、H指数大于10的研究人员以及GPT-3.5。这三个背景部分在一项盲法调查中呈现给H指数范围在1至96之间的研究人员。首先,研究人员使用5点李克特量表评估背景的主要组成部分。其次,要求他们识别哪个背景是仅由人类撰写的,或者是使用大语言模型生成工具撰写的。
共有80名研究人员完成了调查。H指数中位数为3(四分位间距,1 - 7.25),大多数(36%)研究人员来自重症监护专业。与H指数为22和13的研究人员相比,GPT-3.5在主要背景组成部分的李克特量表排名中得分较高(中位数分别为4.5对3.82对3.6对4.5;<0.001)。检测研究人员撰写内容与GPT-3.5撰写内容的敏感性和特异性较差,分别为22.4%和57.6%。
GPT-3.5能够创建与医学研究人员撰写内容难以区分的背景研究内容。在撰写重症监护临床研究问题的背景部分时,与H指数为22和13的医学研究人员相比,它的得分更高。