评估用于口腔颌面放射学患者教育的人工智能聊天机器人。

Evaluating artificial intelligence chatbots for patient education in oral and maxillofacial radiology.

作者信息

Helvacioglu-Yigit Dilek, Demirturk Husniye, Ali Kamran, Tamimi Dania, Koenig Lisa, Almashraqi Abeer

机构信息

College of Dental Medicine, QU Health, Qatar University, Doha, Qatar.

University of Pittsburgh School of Dental Medicine, Pittsburgh, PA, USA; Oral and Maxillofacial Radiology Consultant, Private Practice, Wexford, PA, USA.

出版信息

Oral Surg Oral Med Oral Pathol Oral Radiol. 2025 Jun;139(6):750-759. doi: 10.1016/j.oooo.2025.01.001. Epub 2025 Jan 11.

DOI:10.1016/j.oooo.2025.01.001

PMID:40044548

Abstract

OBJECTIVE

This study aimed to compare the quality and readability of the responses generated by 3 publicly available artificial intelligence (AI) chatbots in answering frequently asked questions (FAQs) related to Oral and Maxillofacial Radiology (OMR) to assess their suitability for patient education.

STUDY DESIGN

Fifteen OMR-related questions were selected from professional patient information websites. These questions were posed to ChatGPT-3.5 by OpenAI, Gemini 1.5 Pro by Google, and Copilot by Microsoft to generate responses. Three board-certified OMR specialists evaluated the responses regarding scientific adequacy, ease of understanding, and overall reader satisfaction. Readability was assessed using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores. The Wilcoxon signed-rank test was conducted to compare the scores assigned by the evaluators to the responses from the chatbots and professional websites. Interevaluator agreement was examined by calculating the Fleiss kappa coefficient.

RESULTS

There were no significant differences between groups in terms of scientific adequacy. In terms of readability, chatbots had overall mean FKGL and FRE scores of 12.97 and 34.11, respectively. Interevaluator agreement level was generally high.

CONCLUSIONS

Although chatbots are relatively good at responding to FAQs, validating AI-generated information using input from healthcare professionals can enhance patient care and safety. Readability of the text content in the chatbots and websites requires high reading levels.

摘要

目的

本研究旨在比较3个公开可用的人工智能（AI）聊天机器人在回答与口腔颌面放射学（OMR）相关的常见问题（FAQ）时所生成回复的质量和可读性，以评估它们在患者教育方面的适用性。

研究设计

从专业患者信息网站中选取了15个与OMR相关的问题。将这些问题分别抛给OpenAI的ChatGPT-3.5、谷歌的Gemini 1.5 Pro和微软的Copilot，以生成回复。3位获得委员会认证的OMR专家对回复的科学充分性、易理解性和总体读者满意度进行了评估。使用弗莱什-金凯德年级水平（FKGL）和弗莱什阅读简易度（FRE）分数来评估可读性。进行威尔科克森符号秩检验，以比较评估者对聊天机器人和专业网站回复所给出的分数。通过计算弗莱iss卡帕系数来检验评估者间的一致性。