基于大语言模型生成文本的临床研究——人工智能辅助写作的临床研究（CRAW）

Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.

作者信息

Huespe Ivan A, Echeverri Jorge, Khalid Aisha, Carboni Bisso Indalecio, Musso Carlos G, Surani Salim, Bansal Vikas, Kashyap Rahul

机构信息

Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.

Universidad de Buenos Aires, Buenos Aires, Argentina.

出版信息

Crit Care Explor. 2023 Oct 2;5(10):e0975. doi: 10.1097/CCE.0000000000000975. eCollection 2023 Oct.

DOI:10.1097/CCE.0000000000000975

PMID:37795455

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10547240/

Abstract

IMPORTANCE

The scientific community debates Generative Pre-trained Transformer (GPT)-3.5's article quality, authorship merit, originality, and ethical use in scientific writing.

OBJECTIVES

Assess GPT-3.5's ability to craft the background section of critical care clinical research questions compared to medical researchers with H-indices of 22 and 13.

DESIGN

Observational cross-sectional study.

SETTING

Researchers from 20 countries from six continents evaluated the backgrounds.

PARTICIPANTS

Researchers with a Scopus index greater than 1 were included.

MAIN OUTCOMES AND MEASURES

In this study, we generated a background section of a critical care clinical research question on "acute kidney injury in sepsis" using three different methods: researcher with H-index greater than 20, researcher with H-index greater than 10, and GPT-3.5. The three background sections were presented in a blinded survey to researchers with an H-index range between 1 and 96. First, the researchers evaluated the main components of the background using a 5-point Likert scale. Second, they were asked to identify which background was written by humans only or with large language model-generated tools.

RESULTS

A total of 80 researchers completed the survey. The median H-index was 3 (interquartile range, 1-7.25) and most (36%) researchers were from the Critical Care specialty. When compared with researchers with an H-index of 22 and 13, GPT-3.5 was marked high on the Likert scale ranking on main background components (median 4.5 vs. 3.82 vs. 3.6 vs. 4.5, respectively; < 0.001). The sensitivity and specificity to detect researchers writing versus GPT-3.5 writing were poor, 22.4% and 57.6%, respectively.

CONCLUSIONS AND RELEVANCE

GPT-3.5 could create background research content indistinguishable from the writing of a medical researcher. It was marked higher compared with medical researchers with an H-index of 22 and 13 in writing the background section of a critical care clinical research question.

摘要

重要性

科学界对生成式预训练变换器（GPT）-3.5在科学写作中的文章质量、作者资质、原创性和道德使用存在争议。

目的

与H指数分别为22和13的医学研究人员相比，评估GPT-3.5撰写重症监护临床研究问题背景部分的能力。

设计

观察性横断面研究。

设置

来自六大洲20个国家的研究人员对背景进行了评估。

参与者

纳入了Scopus指数大于1的研究人员。

主要结局和测量指标

在本研究中，我们使用三种不同方法生成了关于“脓毒症中的急性肾损伤”这一重症监护临床研究问题的背景部分：H指数大于20的研究人员、H指数大于10的研究人员以及GPT-3.5。这三个背景部分在一项盲法调查中呈现给H指数范围在1至96之间的研究人员。首先，研究人员使用5点李克特量表评估背景的主要组成部分。其次，要求他们识别哪个背景是仅由人类撰写的，或者是使用大语言模型生成工具撰写的。

结果

共有80名研究人员完成了调查。H指数中位数为3（四分位间距，1 - 7.25），大多数（36%）研究人员来自重症监护专业。与H指数为22和13的研究人员相比，GPT-3.5在主要背景组成部分的李克特量表排名中得分较高（中位数分别为4.5对3.82对3.6对4.5；<0.001）。检测研究人员撰写内容与GPT-3.5撰写内容的敏感性和特异性较差，分别为22.4%和57.6%。

结论及相关性

GPT-3.5能够创建与医学研究人员撰写内容难以区分的背景研究内容。在撰写重症监护临床研究问题的背景部分时，与H指数为22和13的医学研究人员相比，它的得分更高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f26/10547240/4f8593a635b9/cc9-5-e0975-g001.jpg

相似文献

Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.基于大语言模型生成文本的临床研究——人工智能辅助写作的临床研究（CRAW）

Crit Care Explor. 2023 Oct 2;5(10):e0975. doi: 10.1097/CCE.0000000000000975. eCollection 2023 Oct.

ChatGPT-4 and Human Researchers Are Equal in Writing Scientific Introduction Sections: A Blinded, Randomized, Non-inferiority Controlled Study.ChatGPT-4与人类研究人员在撰写科学引言部分时表现相当：一项双盲、随机、非劣效性对照研究。

Cureus. 2023 Nov 18;15(11):e49019. doi: 10.7759/cureus.49019. eCollection 2023 Nov.

Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.人工智能可以生成虚假但看起来真实的科学医学文章：潘多拉的盒子已经被打开。

J Med Internet Res. 2023 May 31;25:e46924. doi: 10.2196/46924.

Can artificial intelligence help for scientific writing?人工智能能帮助进行科学写作吗？

Crit Care. 2023 Feb 25;27(1):75. doi: 10.1186/s13054-023-04380-2.

Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4（GPT-4）在规范放射学报告方面的性能。

Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.

A comparison of cover letters written by ChatGPT-4 or humans.比较 ChatGPT-4 或人类撰写的求职信。

Dan Med J. 2023 Nov 23;70(12):A06230412.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Utilizing ChatGPT in clinical research related to anesthesiology: a comprehensive review of opportunities and limitations.在麻醉学相关临床研究中使用ChatGPT：机遇与局限的全面综述

Anesth Pain Med (Seoul). 2023 Jul;18(3):244-251. doi: 10.17085/apm.23056. Epub 2023 Jul 26.

Comparison of the Quality of Discharge Letters Written by Large Language Models and Junior Clinicians: Single-Blinded Study.大语言模型与初级临床医生撰写的出院小结质量比较：单盲研究

J Med Internet Res. 2024 Jul 24;26:e57721. doi: 10.2196/57721.

Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?放射学中人工智能生成的社论：专家编辑能检测出来吗？

AJNR Am J Neuroradiol. 2025 Mar 4;46(3):559-566. doi: 10.3174/ajnr.A8505.

引用本文的文献

Medical Students' Perceptions of Large Language Models in Healthcare: A Multinational Cross-Sectional Study.医学生对医疗保健领域大语言模型的认知：一项跨国横断面研究。

J Med Educ Curric Dev. 2025 May 21;12:23821205251331124. doi: 10.1177/23821205251331124. eCollection 2025 Jan-Dec.

Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.评估大语言模型聊天机器人生成通俗易懂摘要的能力。

Cureus. 2025 Mar 21;17(3):e80976. doi: 10.7759/cureus.80976. eCollection 2025 Mar.

Utilizing large language models for gastroenterology research: a conceptual framework.利用大语言模型进行胃肠病学研究：一个概念框架。

Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors.人工智能在家庭医学出版物中的应用：期刊编辑联合社论

PRiMER. 2025 Jan 3;9:3. doi: 10.22454/PRiMER.2025.889328. eCollection 2025.

Use of artificial intelligence in family medicine publications: Joint statement from journal editors.家庭医学出版物中人工智能的应用：期刊编辑联合声明

Can Fam Physician. 2025 Jan;71(1):10-12. doi: 10.46747/cfp.710110.

Use of AI in Family Medicine Publications: A Joint Editorial from Journal Editors.人工智能在家庭医学出版物中的应用：期刊编辑联合社论

J Am Board Fam Med. 2025 May 12;38(1):4-8. doi: 10.3122/jabfm.2024.240397R0.

Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors.人工智能在家庭医学出版物中的应用：期刊编辑联合社论

Fam Med. 2025 Jan;57(1):1-5. doi: 10.22454/FamMed.2025.466696.

Use of AI in family medicine publications: a joint editorial from journal editors.人工智能在家庭医学出版物中的应用：期刊编辑联合社论

Fam Med Community Health. 2025 Jan 13;13(1):e003238. doi: 10.1136/fmch-2024-003238.

Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors.人工智能在家庭医学出版物中的应用：期刊编辑联合社论

Ann Fam Med. 2025 Jan 27;23(1):1-4. doi: 10.1370/afm.240575.

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis.生物医学与健康信息学中的大语言模型：文献计量分析综述

J Healthc Inform Res. 2024 Sep 14;8(4):658-711. doi: 10.1007/s41666-024-00171-8. eCollection 2024 Dec.

本文引用的文献

Radiology Gets Chatty: The ChatGPT Saga Unfolds.放射学开始健谈：ChatGPT的传奇故事展开。

Cureus. 2023 Jun 8;15(6):e40135. doi: 10.7759/cureus.40135. eCollection 2023 Jun.

The Readiness of ChatGPT to Write Scientific Case Reports Independently: A Comparative Evaluation Between Human and Artificial Intelligence.ChatGPT独立撰写科学病例报告的准备情况：人与人工智能的比较评估

Cureus. 2023 May 23;15(5):e39386. doi: 10.7759/cureus.39386. eCollection 2023 May.

Large language models and the perils of their hallucinations.大语言模型及其幻觉的风险。

Crit Care. 2023 Mar 21;27(1):120. doi: 10.1186/s13054-023-04393-x.

Identifying ChatGPT-written OBGYN abstracts using a simple tool.使用一个简单工具识别由ChatGPT撰写的妇产科摘要。

Am J Obstet Gynecol MFM. 2023 Jun;5(6):100936. doi: 10.1016/j.ajogmf.2023.100936. Epub 2023 Mar 15.

Evaluating the use of large language model in identifying top research questions in gastroenterology.评估大型语言模型在识别胃肠病学领域顶级研究问题中的应用。

Sci Rep. 2023 Mar 13;13(1):4164. doi: 10.1038/s41598-023-31412-2.

Artificially intelligent reflection? Smoke and mirrors and a tale of two perspectives.人工智能反思？障眼法与两种视角的故事。

Intensive Care Med. 2023 May;49(5):609-610. doi: 10.1007/s00134-023-07008-9. Epub 2023 Mar 11.

Correction to: Can artificial intelligence help for scientific writing?对《人工智能能助力科学写作吗？》的勘误

Crit Care. 2023 Mar 8;27(1):99. doi: 10.1186/s13054-023-04390-0.

Can artificial intelligence help for scientific writing?人工智能能帮助进行科学写作吗？

Crit Care. 2023 Feb 25;27(1):75. doi: 10.1186/s13054-023-04380-2.

Application of ChatGPT in Cosmetic Plastic Surgery: Ally or Antagonist?ChatGPT在美容整形手术中的应用：盟友还是对手？

Aesthet Surg J. 2023 Jun 14;43(7):NP587-NP590. doi: 10.1093/asj/sjad042.

AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation.这篇手稿不是人工智能写的，或者是吗？我们能骗过人工智能文本检测器来生成文本吗？ChatGPT和人工智能在运动与运动医学手稿生成方面的潜在未来。

BMJ Open Sport Exerc Med. 2023 Feb 16;9(1):e001568. doi: 10.1136/bmjsem-2023-001568. eCollection 2023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于大语言模型生成文本的临床研究——人工智能辅助写作的临床研究（CRAW）

Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.

作者信息

机构信息

出版信息

IMPORTANCE

OBJECTIVES

DESIGN

SETTING

PARTICIPANTS

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

设计

设置

参与者

主要结局和测量指标

结果

结论及相关性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献