Høj Simon, Thomsen Simon Francis, Ulrik Charlotte Suppli, Meteran Hanieh, Sigsgaard Torben, Meteran Howraman
Department of Dermatology, Venereology, and Wound Healing Centre, Copenhagen University Hospital-Bispebjerg, Bispebjerg, Denmark.
Department of Public Health, Environment, Occupation, and Health, Aarhus University, Aarhus, Denmark.
J Allergy Clin Immunol Glob. 2024 Aug 28;3(4):100330. doi: 10.1016/j.jacig.2024.100330. eCollection 2024 Nov.
This study assessed the reliability of ChatGPT as a source of information on asthma, given the increasing use of artificial intelligence-driven models for medical information. Prior concerns about misinformation on atopic diseases in various digital platforms underline the importance of this evaluation.
We aimed to evaluate the scientific reliability of ChatGPT as a source of information on asthma.
The study involved analyzing ChatGPT's responses to 26 asthma-related questions, each followed by a follow-up question. These encompassed definition/risk factors, diagnosis, treatment, lifestyle factors, and specific clinical inquiries. Medical professionals specialized in allergic and respiratory diseases independently assessed the responses using a 1-to-5 accuracy scale.
Approximately 81% of the responses scored 4 or higher, suggesting a generally high accuracy level. However, 5 responses scored >3, indicating minor potentially harmful inaccuracies. The overall median score was 4. Fleiss multirater kappa value showed moderate agreement among raters.
ChatGPT generally provides reliable asthma-related information, but its limitations, such as lack of depth in certain responses and inability to cite sources or update in real time, were noted. It shows promise as an educational tool, but it should not be a substitute for professional medical advice. Future studies should explore its applicability for different user demographics and compare it with newer artificial intelligence models.
鉴于人工智能驱动的模型在医学信息领域的使用日益增加,本研究评估了ChatGPT作为哮喘信息来源的可靠性。此前在各种数字平台上对特应性疾病错误信息的担忧凸显了此次评估的重要性。
我们旨在评估ChatGPT作为哮喘信息来源的科学可靠性。
该研究包括分析ChatGPT对26个与哮喘相关问题的回答,每个问题后都有一个后续问题。这些问题涵盖定义/风险因素、诊断、治疗、生活方式因素以及特定的临床询问。专门从事过敏性和呼吸道疾病的医学专业人员使用1至5的准确性量表独立评估这些回答。
约81%的回答得分在4分及以上,表明总体准确性较高。然而,有5个回答得分大于3分,表明存在一些潜在的轻微有害不准确之处。总体中位数得分是4分。Fleiss多评分者kappa值显示评分者之间存在中等程度的一致性。
ChatGPT通常能提供可靠的哮喘相关信息,但也指出了其局限性,如某些回答缺乏深度、无法引用来源或实时更新。它作为一种教育工具显示出了潜力,但不应替代专业的医疗建议。未来的研究应探索其对不同用户群体的适用性,并将其与更新的人工智能模型进行比较。