ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

作者信息

Weuthen Felix A, Otte Nelly, Krabbe Hanif, Kraus Thomas, Krabbe Julia

机构信息

Institute of Occupational, Social and Environmental Medicine, Medical Faculty, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany.

Department of Vascular Surgery, St. Josef Hospital Bochum, Katholisches Klinikum Bochum, Medical Faculty, Ruhr University Bochum, Bochum, Germany.

出版信息

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

DOI:10.2196/63857

PMID:40393042

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12112251/

Abstract

BACKGROUND

Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored.

OBJECTIVE

This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany.

METHODS

In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics.

RESULTS

Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36).

CONCLUSIONS

ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.

摘要

背景

人工智能正成为日常生活和医学领域的一部分。生成式人工智能模型，如GPT-4和ChatGPT，因其性能和可靠性的提升而人气飙升。然而，这些模型在职业医学等专业领域的应用在很大程度上仍未得到探索。

目的

本研究旨在评估生成式大语言模型（如ChatGPT）作为德国职业医学中医学研究甚至临床决策支持工具的潜在适用性。

方法

在这项随机对照研究中，使用为此目的开发的网络应用程序，对ChatGPT在医学研究和临床决策中的可用性进行了调查。纳入标准为医生或医学生。参与者（N = 56）被要求处理3例职业性肺病病例并回答与病例相关的问题。通过抛硬币按每组医生比例将他们分为2组。一组使用基于最新GPT-4-Turbo模型的类似于ChatGPT的集成聊天应用程序研究病例，而另一组使用他们常用的研究方法，如谷歌、Amboss或DocCheck。主要结果是基于正确答案的病例表现，次要结果包括病例处理前后特定问题准确性的变化以及自我评估的职业医学专业知识。由于聊天窗口显示所属组，传统上分组并未设盲；参与者只知道该研究考察基于网络的研究，而不知道具体分组情况。

结果

ChatGPT组（n = 27）的参与者在特定研究方面表现更好，例如对于潜在有害物质或活动（例如，病例1：ChatGPT组识别出2.5种导致胸膜变化的有害物质，而自行研究组为1.8种；P = 0.01；Cohen r = -0.38），并且导致自我评估的专业知识有所增加（ChatGPT组从3.9提高到3.4，自行研究组从3.5提高到3.4；德国学校成绩等级1 = 非常好，6 = 不满意；P = 0.047）。然而，临床决策，例如是否应提交职业病报告，更多是参与者自行研究后正确做出的（n = 29；例如，病例1：是否应提交职业病报告？ChatGPT组7名参与者回答是，自行研究组为14名；P = 0.007；优势比6.00，95%CI 1.54 - 23.36）。

结论

ChatGPT可以成为有针对性医学研究的有用工具，即使是针对职业医学中关于职业病的相当具体的问题。然而，目前临床决策应由大语言模型提供支持而不是由其做出。即使初步结果很有前景，未来的系统也应进行严格评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/51142e5ae6c9/formative-v9-e63857-g001.jpg

相似文献

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Eliciting adverse effects data from participants in clinical trials.

Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

Antiretroviral post-exposure prophylaxis (PEP) for occupational HIV exposure.

Cochrane Database Syst Rev. 2007 Jan 24;2007(1):CD002835. doi: 10.1002/14651858.CD002835.pub3.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.

Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Shared decision-making for people with asthma.

Cochrane Database Syst Rev. 2017 Oct 3;10(10):CD012330. doi: 10.1002/14651858.CD012330.pub2.

Shared decision-making interventions for people with mental health conditions.

Cochrane Database Syst Rev. 2022 Nov 11;11(11):CD007297. doi: 10.1002/14651858.CD007297.pub3.

引用本文的文献

Gender Differences in the Use of ChatGPT as Generative Artificial Intelligence for Clinical Research and Decision-Making in Occupational Medicine.

Healthcare (Basel). 2025 Jun 11;13(12):1394. doi: 10.3390/healthcare13121394.

本文引用的文献

Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.

JMIR Med Educ. 2024 Jul 23;10:e52818. doi: 10.2196/52818.

Optimizing Diagnostic Performance of ChatGPT: The Impact of Prompt Engineering on Thoracic Radiology Cases.

Cureus. 2024 May 9;16(5):e60009. doi: 10.7759/cureus.60009. eCollection 2024 May.

Can Chat-GPT assist orthopedic surgeons in evaluating the quality of rotator cuff surgery patient information videos?

J Shoulder Elbow Surg. 2025 Jan;34(1):141-146. doi: 10.1016/j.jse.2024.04.021. Epub 2024 Jun 7.

Evaluation of ChatGPT-Generated Educational Patient Pamphlets for Common Interventional Radiology Procedures.

Acad Radiol. 2024 Nov;31(11):4548-4553. doi: 10.1016/j.acra.2024.05.024. Epub 2024 Jun 4.

Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.

Clin Neuroradiol. 2024 Dec;34(4):779-787. doi: 10.1007/s00062-024-01426-y. Epub 2024 May 28.

Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.

Am J Emerg Med. 2024 Jun;80:51-60. doi: 10.1016/j.ajem.2024.03.017. Epub 2024 Mar 15.

Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?

Am J Emerg Med. 2024 May;79:44-47. doi: 10.1016/j.ajem.2024.02.008. Epub 2024 Feb 7.

Evaluation of a chat GPT generated patient information leaflet about laparoscopic cholecystectomy.

ANZ J Surg. 2024 Mar;94(3):353-355. doi: 10.1111/ans.18834. Epub 2023 Dec 22.

ChatGPT versus clinician: challenging the diagnostic capabilities of artificial intelligence in dermatology.

Clin Exp Dermatol. 2024 Jun 25;49(7):707-710. doi: 10.1093/ced/llad402.

Hallucination or Confabulation? Neuroanatomy as metaphor in Large Language Models.

PLOS Digit Health. 2023 Nov 1;2(11):e0000388. doi: 10.1371/journal.pdig.0000388. eCollection 2023 Nov.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献