Weuthen Felix A, Otte Nelly, Krabbe Hanif, Kraus Thomas, Krabbe Julia
Institute of Occupational, Social and Environmental Medicine, Medical Faculty, Rheinisch-Westfälische Technische Hochschule Aachen University, Aachen, Germany.
Department of Vascular Surgery, St. Josef Hospital Bochum, Katholisches Klinikum Bochum, Medical Faculty, Ruhr University Bochum, Bochum, Germany.
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
BACKGROUND: Artificial intelligence is becoming a part of daily life and the medical field. Generative artificial intelligence models, such as GPT-4 and ChatGPT, are experiencing a surge in popularity due to their enhanced performance and reliability. However, the application of these models in specialized domains, such as occupational medicine, remains largely unexplored. OBJECTIVE: This study aims to assess the potential suitability of a generative large language model, such as ChatGPT, as a support tool for medical research and even clinical decisions in occupational medicine in Germany. METHODS: In this randomized controlled study, the usability of ChatGPT for medical research and clinical decision-making was investigated using a web application developed for this purpose. Eligibility criteria were being a physician or medical student. Participants (N=56) were asked to work on 3 cases of occupational lung diseases and answer case-related questions. They were allocated via coin weighted for proportions of physicians in each group into 2 groups. One group researched the cases using an integrated chat application similar to ChatGPT based on the latest GPT-4-Turbo model, while the other used their usual research methods, such as Google, Amboss, or DocCheck. The primary outcome was case performance based on correct answers, while secondary outcomes included changes in specific question accuracy and self-assessed occupational medicine expertise before and after case processing. Group assignment was not traditionally blinded, as the chat window indicated membership; participants only knew the study examined web-based research, not group specifics. RESULTS: Participants of the ChatGPT group (n=27) showed better performance in specific research, for example, for potentially hazardous substances or activities (eg, case 1: ChatGPT group 2.5 hazardous substances that cause pleural changes versus 1.8 in a group with own research; P=.01; Cohen r=-0.38), and led to an increase in self-assessment with regard to specialist knowledge (from 3.9 to 3.4 in the ChatGPT group vs from 3.5 to 3.4 in the own research group; German school grades between 1=very good and 6=unsatisfactory; P=.047). However, clinical decisions, for example, whether an occupational disease report should be filed, were more often made correctly as a result of the participant's own research (n=29; eg, case 1: Should an occupational disease report be filed? Yes for 7 participants in the ChatGPT group vs 14 in their own research group; P=.007; odds ratio 6.00, 95% CI 1.54-23.36). CONCLUSIONS: ChatGPT can be a useful tool for targeted medical research, even for rather specific questions in occupational medicine regarding occupational diseases. However, clinical decisions should currently only be supported and not made by the large language model. Future systems should be critically assessed, even if the initial results are promising.
背景:人工智能正成为日常生活和医学领域的一部分。生成式人工智能模型,如GPT-4和ChatGPT,因其性能和可靠性的提升而人气飙升。然而,这些模型在职业医学等专业领域的应用在很大程度上仍未得到探索。 目的:本研究旨在评估生成式大语言模型(如ChatGPT)作为德国职业医学中医学研究甚至临床决策支持工具的潜在适用性。 方法:在这项随机对照研究中,使用为此目的开发的网络应用程序,对ChatGPT在医学研究和临床决策中的可用性进行了调查。纳入标准为医生或医学生。参与者(N = 56)被要求处理3例职业性肺病病例并回答与病例相关的问题。通过抛硬币按每组医生比例将他们分为2组。一组使用基于最新GPT-4-Turbo模型的类似于ChatGPT的集成聊天应用程序研究病例,而另一组使用他们常用的研究方法,如谷歌、Amboss或DocCheck。主要结果是基于正确答案的病例表现,次要结果包括病例处理前后特定问题准确性的变化以及自我评估的职业医学专业知识。由于聊天窗口显示所属组,传统上分组并未设盲;参与者只知道该研究考察基于网络的研究,而不知道具体分组情况。 结果:ChatGPT组(n = 27)的参与者在特定研究方面表现更好,例如对于潜在有害物质或活动(例如,病例1:ChatGPT组识别出2.5种导致胸膜变化的有害物质,而自行研究组为1.8种;P = 0.01;Cohen r = -0.38),并且导致自我评估的专业知识有所增加(ChatGPT组从3.9提高到3.4,自行研究组从3.5提高到3.4;德国学校成绩等级1 = 非常好,6 = 不满意;P = 0.047)。然而,临床决策,例如是否应提交职业病报告,更多是参与者自行研究后正确做出的(n = 29;例如,病例1:是否应提交职业病报告?ChatGPT组7名参与者回答是,自行研究组为14名;P = 0.007;优势比6.00,95%CI 1.54 - 23.36)。 结论:ChatGPT可以成为有针对性医学研究的有用工具,即使是针对职业医学中关于职业病的相当具体的问题。然而,目前临床决策应由大语言模型提供支持而不是由其做出。即使初步结果很有前景,未来的系统也应进行严格评估。
Cochrane Database Syst Rev. 2022-5-20
Cochrane Database Syst Rev. 2018-1-16
Cochrane Database Syst Rev. 2007-1-24
Health Technol Assess. 2001
Cochrane Database Syst Rev. 2017-10-3
Cochrane Database Syst Rev. 2022-11-11
Clin Exp Dermatol. 2024-6-25
PLOS Digit Health. 2023-11-1