Güzelce Sultanoğlu Elifnur
Prosthodontics, Sağlık Bilimleri Üniversitesi, Istanbul, TUR.
Cureus. 2024 Oct 6;16(10):e70945. doi: 10.7759/cureus.70945. eCollection 2024 Oct.
Aim This study aimed to evaluate the accuracy and quality of the answers given by artificial intelligence (AI) applications to the questions directed at tooth deficiency treatments. Materials and methods Fifteen questions asked by patients/ordinary people about missing tooth treatment were selected from the Quora platform. Questions were asked to the ChatGPT-4 (OpenAI Inc., San Francisco, California, United States) and Copilot (Microsoft Corporation, Redmond, Washington, United States) models. Responses were assessed by two expert physicians using a five-point Likert scale (LS) for accuracy and the Global Quality Scale (GQS) for quality. To assess the internal consistency and inter-rater agreement of ChatGPT-4 and Copilot, Cronbach's alpha, Spearman-Brown's coefficient, and Guttman's split-half coefficient were calculated to measure the reliability and internal consistency of both instruments (α=0.05). Results Copilot showed a mean LS value of 3.83±0.36 and ChatGPT-4 showed a lower mean value of 3.93±0.32. ChatGPT-4's GQS mean value (3.9±0.28) is also higher than Copilot (3.83±0.06) (p<0.001). Conclusion It can be said that AI chatbots gave highly accurate and consistent answers to questions about the treatment of toothlessness. With the ever-developing technology, AI chatbots can be used as consultants for dental treatments in the future.
目的 本研究旨在评估人工智能(AI)应用程序针对牙齿缺失治疗问题给出答案的准确性和质量。
材料与方法 从Quora平台选取了15个患者/普通人关于牙齿缺失治疗的问题。向ChatGPT-4(美国加利福尼亚州旧金山OpenAI公司)和Copilot(美国华盛顿州雷德蒙德微软公司)模型提问。由两名专家医生使用五点李克特量表(LS)评估答案的准确性,使用全球质量量表(GQS)评估答案的质量。为评估ChatGPT-4和Copilot的内部一致性和评分者间一致性,计算了克朗巴赫α系数、斯皮尔曼-布朗系数和古特曼折半系数,以衡量两种工具的可靠性和内部一致性(α = 0.05)。
结果 Copilot的平均LS值为3.83±0.36,ChatGPT-4的平均LS值较低,为3.93±0.32。ChatGPT-4的GQS平均值(3.9±0.28)也高于Copilot(3.83±0.06)(p<0.001)。
结论 可以说,人工智能聊天机器人对牙齿缺失治疗问题给出了高度准确和一致的答案。随着技术的不断发展,未来人工智能聊天机器人可作为牙科治疗的咨询工具。