Şişman Alanur Çiftçi, Acar Ahmet Hüseyin
Hamidiye Faculty of Dental Medicine, Department of Oral and Maxillofacial Surgery, University of Health Sciences, Istanbul, Türkiye.
Faculty of Dentistry, Department of Oral and Maxillofacial Surgery, Istanbul Medeniyet University, Istanbul, Türkiye.
BMC Oral Health. 2025 Mar 7;25(1):351. doi: 10.1186/s12903-025-05732-w.
This study aims to evaluate the potential of AI-based chatbots in assisting with clinical decision-making in the management of medically complex patients in oral surgery.
A team of oral and maxillofacial surgeons developed a pool of open-ended questions de novo. The validity of the questions was assessed using Lawshe's Content Validity Index. The questions, which focused on systemic diseases and common conditions that may raise concerns during oral surgery, were presented to ChatGPT 3.5 and Claude-instant in two separate sessions, spaced one week apart. Two experienced maxillofacial surgeons, blinded to the chatbots, assessed the responses for quality, accuracy, and completeness using a modified DISCERN tool and Likert scale. Intraclass correlation, Mann-Whitney U test, skewness, and kurtosis coefficients were employed to compare the performances of the chatbots.
Most responses were high quality: 86% and 79.6% for ChatGPT, and 81.25% and 89% for Claude-instant in sessions 1 and 2, respectively. In terms of accuracy, ChatGPT had 92% and 93.4% of its responses rated as completely correct in sessions 1 and 2, respectively, while Claude-instant had 95.2% and 89%. For completeness, ChatGPT had 88.5% and 86.8% of its responses rated as adequate or comprehensive in sessions 1 and 2, respectively, while Claude-instant had 95.2% and 86%.
Ongoing software developments and the increasing acceptance of chatbots among healthcare professionals hold promise that these tools can provide rapid solutions to the high demand for medical care, ease professionals' workload, reduce costs, and save time.
本研究旨在评估基于人工智能的聊天机器人在口腔外科复杂患者管理中辅助临床决策的潜力。
一组口腔颌面外科医生重新编制了一系列开放式问题。使用劳希内容效度指数评估问题的有效性。这些聚焦于全身疾病以及口腔外科手术中可能引发关注的常见病症的问题,在两个独立的环节中分别呈现给ChatGPT 3.5和Claude-instant,两个环节间隔一周。两名经验丰富的颌面外科医生在对聊天机器人不知情的情况下,使用改良的DISCERN工具和李克特量表评估回答的质量、准确性和完整性。采用组内相关系数、曼-惠特尼U检验、偏度和峰度系数来比较聊天机器人的表现。
大多数回答质量较高:在环节1中,ChatGPT的高质量回答率为86%,Claude-instant为81.25%;在环节2中,ChatGPT为79.6%,Claude-instant为89%。在准确性方面,ChatGPT在环节1和环节2中分别有92%和93.4%的回答被评为完全正确,而Claude-instant分别为 95.2%和89%。在完整性方面,ChatGPT在环节1和环节2中分别有88.5%和86.8%的回答被评为足够或全面,而Claude-instant分别为95.2%和86%。
持续的软件开发以及医疗保健专业人员对聊天机器人的接受度不断提高,预示着这些工具能够为医疗护理的高需求提供快速解决方案,减轻专业人员的工作量,降低成本并节省时间。