文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

作者信息

Weuthen Felix A, Otte Nelly, Krabbe Hanif, Kraus Thomas, Krabbe Julia

机构信息

Institute of Occupational, Social and Environmental Medicine, Medical Faculty, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany.

Department of Vascular Surgery, St. Josef Hospital Bochum, Katholisches Klinikum Bochum, Medical Faculty, Ruhr University Bochum, Bochum, Germany.

出版信息

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.


DOI:10.2196/63857
PMID:40393042
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12112251/
Abstract

BACKGROUND: Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored. OBJECTIVE: This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany. METHODS: In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics. RESULTS: Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36). CONCLUSIONS: ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.

摘要

背景:人工智能正成为日常生活和医学领域的一部分。生成式人工智能模型,如GPT-4和ChatGPT,因其性能和可靠性的提升而人气飙升。然而,这些模型在职业医学等专业领域的应用在很大程度上仍未得到探索。 目的:本研究旨在评估生成式大语言模型(如ChatGPT)作为德国职业医学中医学研究甚至临床决策支持工具的潜在适用性。 方法:在这项随机对照研究中,使用为此目的开发的网络应用程序,对ChatGPT在医学研究和临床决策中的可用性进行了调查。纳入标准为医生或医学生。参与者(N = 56)被要求处理3例职业性肺病病例并回答与病例相关的问题。通过抛硬币按每组医生比例将他们分为2组。一组使用基于最新GPT-4-Turbo模型的类似于ChatGPT的集成聊天应用程序研究病例,而另一组使用他们常用的研究方法,如谷歌、Amboss或DocCheck。主要结果是基于正确答案的病例表现,次要结果包括病例处理前后特定问题准确性的变化以及自我评估的职业医学专业知识。由于聊天窗口显示所属组,传统上分组并未设盲;参与者只知道该研究考察基于网络的研究,而不知道具体分组情况。 结果:ChatGPT组(n = 27)的参与者在特定研究方面表现更好,例如对于潜在有害物质或活动(例如,病例1:ChatGPT组识别出2.5种导致胸膜变化的有害物质,而自行研究组为1.8种;P = 0.01;Cohen r = -0.38),并且导致自我评估的专业知识有所增加(ChatGPT组从3.9提高到3.4,自行研究组从3.5提高到3.4;德国学校成绩等级1 = 非常好,6 = 不满意;P = 0.047)。然而,临床决策,例如是否应提交职业病报告,更多是参与者自行研究后正确做出的(n = 29;例如,病例1:是否应提交职业病报告?ChatGPT组7名参与者回答是,自行研究组为14名;P = 0.007;优势比6.00,95%CI 1.54 - 23.36)。 结论:ChatGPT可以成为有针对性医学研究的有用工具,即使是针对职业医学中关于职业病的相当具体的问题。然而,目前临床决策应由大语言模型提供支持而不是由其做出。即使初步结果很有前景,未来的系统也应进行严格评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/b661a986466d/formative-v9-e63857-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/51142e5ae6c9/formative-v9-e63857-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/ae6fa2fd563b/formative-v9-e63857-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/b661a986466d/formative-v9-e63857-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/51142e5ae6c9/formative-v9-e63857-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/ae6fa2fd563b/formative-v9-e63857-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ffa/12112251/b661a986466d/formative-v9-e63857-g003.jpg

相似文献

[1]
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025-5-20

[2]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[3]
Eliciting adverse effects data from participants in clinical trials.

Cochrane Database Syst Rev. 2018-1-16

[4]
Antiretroviral post-exposure prophylaxis (PEP) for occupational HIV exposure.

Cochrane Database Syst Rev. 2007-1-24

[5]
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.

Health Technol Assess. 2001

[6]
Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001

[7]
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.

Health Technol Assess. 2008-6

[8]
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001

[9]
Shared decision-making for people with asthma.

Cochrane Database Syst Rev. 2017-10-3

[10]
Shared decision-making interventions for people with mental health conditions.

Cochrane Database Syst Rev. 2022-11-11

引用本文的文献

[1]
Gender Differences in the Use of ChatGPT as Generative Artificial Intelligence for Clinical Research and Decision-Making in Occupational Medicine.

Healthcare (Basel). 2025-6-11

本文引用的文献

[1]
Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.

JMIR Med Educ. 2024-7-23

[2]
Optimizing Diagnostic Performance of ChatGPT: The Impact of Prompt Engineering on Thoracic Radiology Cases.

Cureus. 2024-5-9

[3]
Can Chat-GPT assist orthopedic surgeons in evaluating the quality of rotator cuff surgery patient information videos?

J Shoulder Elbow Surg. 2025-1

[4]
Evaluation of ChatGPT-Generated Educational Patient Pamphlets for Common Interventional Radiology Procedures.

Acad Radiol. 2024-11

[5]
Comparing the Diagnostic Performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and Radiologists in Challenging Neuroradiology Cases.

Clin Neuroradiol. 2024-12

[6]
Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.

Am J Emerg Med. 2024-6

[7]
Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?

Am J Emerg Med. 2024-5

[8]
Evaluation of a chat GPT generated patient information leaflet about laparoscopic cholecystectomy.

ANZ J Surg. 2024-3

[9]
ChatGPT versus clinician: challenging the diagnostic capabilities of artificial intelligence in dermatology.

Clin Exp Dermatol. 2024-6-25

[10]
Hallucination or Confabulation? Neuroanatomy as metaphor in Large Language Models.

PLOS Digit Health. 2023-11-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索