Cerqueira Bruno Pellozo, Leite Vinicius Cappellette da Silva, França Carla Gonzaga, Leitão Filho Fernando Sergio, Faresin Sonia Maria, Figueiredo Ricardo Gassmann, Cetlin Andrea Antunes, Caetano Lilian Serrasqueiro Ballini, Baddini-Martinez José
. Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo (SP) Brasil.
. Divisão de Pneumologia, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo (SP) Brasil.
J Bras Pneumol. 2025 Sep 8;51(3):e20240388. doi: 10.36416/1806-3756/e20240388. eCollection 2025.
To evaluate the quality of ChatGPT answers to asthma-related questions, as assessed from the perspectives of asthma specialists and laypersons.
Seven asthma-related questions were asked to ChatGPT (version 4) between May 3, 2024 and May 4, 2024. The questions were standardized with no memory of previous conversations to avoid bias. Six pulmonologists with extensive expertise in asthma acted as judges, independently assessing the quality and reproducibility of the answers from the perspectives of asthma specialists and laypersons. A Likert scale ranging from 1 to 4 was used, and the content validity coefficient was calculated to assess the level of agreement among the judges.
The evaluations showed variability in the quality of the answers provided by ChatGPT. From the perspective of asthma specialists, the scores ranged from 2 to 3, with greater divergence in questions 2, 3, and 5. From the perspective of laypersons, the content validity coefficient exceeded 0.80 for four of the seven questions, with most answers being correct despite a lack of significant depth.
Although ChatGPT performed well in providing answers to laypersons, the answers that it provided to specialists were less accurate and superficial. Although AI has the potential to provide useful information to the public, it should not replace medical guidance. Critical analysis of AI-generated information remains essential for health care professionals and laypersons alike, especially for complex conditions such as asthma.
从哮喘专家和非专业人士的角度评估ChatGPT对哮喘相关问题的回答质量。
在2024年5月3日至2024年5月4日期间,向ChatGPT(版本4)提出了7个与哮喘相关的问题。问题进行了标准化处理,且不保留之前对话的记忆以避免偏差。六位在哮喘方面具有丰富专业知识的肺科医生担任评判员,分别从哮喘专家和非专业人士的角度独立评估回答的质量和可重复性。使用了1至4的李克特量表,并计算内容效度系数以评估评判员之间的一致程度。
评估显示ChatGPT提供的回答质量存在差异。从哮喘专家的角度来看,分数在2至3之间,问题2、3和5的差异更大。从非专业人士的角度来看,七个问题中有四个的内容效度系数超过0.80,尽管回答缺乏深度,但大多数答案是正确的。
尽管ChatGPT在向非专业人士提供回答方面表现良好,但它向专家提供的回答不够准确且较为肤浅。虽然人工智能有潜力向公众提供有用信息,但它不应取代医疗指导。对人工智能生成的信息进行批判性分析对医疗保健专业人员和非专业人士都至关重要,尤其是对于哮喘等复杂病症。