Ayoub Marc, Ballout Ahmad A, Zayek Rosana A, Ayoub Noel F
Neurocritical Care, Northwell, Northshore University Hospital, Manhasset, USA.
Internal Medicine, Elmhurst Hospital Center, Mount Sinai School of Medicine, New York, USA.
Cureus. 2023 Aug 18;15(8):e43690. doi: 10.7759/cureus.43690. eCollection 2023 Aug.
Background Generative artificial intelligence (AI) has integrated into various industries as it has demonstrated enormous potential in automating elaborate processes and enhancing complex decision-making. The ability of these chatbots to critically triage, diagnose, and manage complex medical conditions, remains unknown and requires further research. Objective This cross-sectional study sought to quantitatively analyze the appropriateness of ChatGPT (OpenAI, San Francisco, CA, US) in its ability to triage, synthesize differential diagnoses, and generate treatment plans for nine diverse but common clinical scenarios. Methods Various common clinical scenarios were developed. Each was input into ChatGPT, and the chatbot was asked to develop diagnostic and treatment plans. Five practicing physicians independently scored ChatGPT's responses to the clinical scenarios. Results The average overall score for the triage ranking was 4.2 (SD 0.7). The lowest overall score was for the completeness of the differential diagnosis at 4.1 (0.5). The highest overall scores were seen with the accuracy of the differential diagnosis, initial treatment plan, and overall usefulness of the response (all with an average score of 4.4). Variance among physician scores ranged from 0.24 for accuracy of the differential diagnosis to 0.49 for appropriateness of triage ranking. Discussion ChatGPT has the potential to augment clinical decision-making. More extensive research, however, is needed to ensure accuracy and appropriate recommendations are provided.
背景 生成式人工智能(AI)已融入各行各业,因为它在自动化复杂流程和加强复杂决策方面展现出了巨大潜力。这些聊天机器人对复杂医疗状况进行严格分诊、诊断和管理的能力尚不清楚,需要进一步研究。目的 这项横断面研究旨在定量分析ChatGPT(美国加利福尼亚州旧金山的OpenAI公司)对九种不同但常见临床场景进行分诊、综合鉴别诊断并生成治疗方案的能力的适宜性。方法 设计了各种常见临床场景。将每个场景输入ChatGPT,并要求聊天机器人制定诊断和治疗方案。五位执业医师独立对ChatGPT针对临床场景的回复进行评分。结果 分诊排名的平均总分为4.2(标准差0.7)。鉴别诊断完整性的最低总分为4.1(0.5)。鉴别诊断准确性、初始治疗方案及回复总体有用性的总分最高(平均得分均为4.4)。医师评分的方差范围从鉴别诊断准确性的0.24到分诊排名适宜性的0.49。讨论 ChatGPT有增强临床决策的潜力。然而,需要进行更广泛的研究以确保提供准确且恰当的建议。