Suppr超能文献

基于大语言模型生成文本的临床研究——人工智能辅助写作的临床研究(CRAW)

Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.

作者信息

Huespe Ivan A, Echeverri Jorge, Khalid Aisha, Carboni Bisso Indalecio, Musso Carlos G, Surani Salim, Bansal Vikas, Kashyap Rahul

机构信息

Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.

Universidad de Buenos Aires, Buenos Aires, Argentina.

出版信息

Crit Care Explor. 2023 Oct 2;5(10):e0975. doi: 10.1097/CCE.0000000000000975. eCollection 2023 Oct.

Abstract

IMPORTANCE

The scientific community debates Generative Pre-trained Transformer (GPT)-3.5's article quality, authorship merit, originality, and ethical use in scientific writing.

OBJECTIVES

Assess GPT-3.5's ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13.

DESIGN

Observational cross-sectional study.

SETTING

Researchers from 20 countries from six continents evaluated the backgrounds.

PARTICIPANTS

Researchers with a Scopus index greater than 1 were included.

MAIN OUTCOMES AND MEASURES

In this study, we generated a background section of a critical care clinical research question on "acute kidney injury in sepsis" using three different methods: researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools.

RESULTS

A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1-7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively.

CONCLUSIONS AND RELEVANCE

GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.

摘要

重要性

科学界对生成式预训练变换器(GPT)-3.5在科学写作中的文章质量、作者资质、原创性和道德使用存在争议。

目的

与H指数分别为22和13的医学研究人员相比,评估GPT-3.5撰写重症监护临床研究问题背景部分的能力。

设计

观察性横断面研究。

设置

来自六大洲20个国家的研究人员对背景进行了评估。

参与者

纳入了Scopus指数大于1的研究人员。

主要结局和测量指标

在本研究中,我们使用三种不同方法生成了关于“脓毒症中的急性肾损伤”这一重症监护临床研究问题的背景部分:H指数大于20的研究人员、H指数大于10的研究人员以及GPT-3.5。这三个背景部分在一项盲法调查中呈现给H指数范围在1至96之间的研究人员。首先,研究人员使用5点李克特量表评估背景的主要组成部分。其次,要求他们识别哪个背景是仅由人类撰写的,或者是使用大语言模型生成工具撰写的。

结果

共有80名研究人员完成了调查。H指数中位数为3(四分位间距,1 - 7.25),大多数(36%)研究人员来自重症监护专业。与H指数为22和13的研究人员相比,GPT-3.5在主要背景组成部分的李克特量表排名中得分较高(中位数分别为4.5对3.82对3.6对4.5;<0.001)。检测研究人员撰写内容与GPT-3.5撰写内容的敏感性和特异性较差,分别为22.4%和57.6%。

结论及相关性

GPT-3.5能够创建与医学研究人员撰写内容难以区分的背景研究内容。在撰写重症监护临床研究问题的背景部分时,与H指数为22和13的医学研究人员相比,它的得分更高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f26/10547240/4f8593a635b9/cc9-5-e0975-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验