Suárez Ana, Díaz-Flores García Víctor, Algar Juan, Gómez Sánchez Margarita, Llorente de Pedro María, Freire Yolanda
Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain.
Department of Clinical Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain.
Int Endod J. 2024 Jan;57(1):108-113. doi: 10.1111/iej.13985. Epub 2023 Oct 9.
Chatbot Generative Pre-trained Transformer (ChatGPT) is a generative artificial intelligence (AI) software based on large language models (LLMs), designed to simulate human conversations and generate novel content based on the training data it has been exposed to. The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in endodontics, compared to answers provided by human experts.
Ninety-one dichotomous (yes/no) questions were designed and categorized into three levels of difficulty. Twenty questions were randomly selected from each difficulty level. Sixty answers were generated by ChatGPT for each question. Two endodontic experts independently answered the 60 questions. Statistical analysis was performed using the SPSS program to calculate the consistency and accuracy of the answers generated by ChatGPT compared to the experts. Confidence intervals (95%) and standard deviations were used to estimate variability.
The answers generated by ChatGPT showed high consistency (85.44%). No significant differences in consistency were found based on question difficulty. In terms of answer accuracy, ChatGPT achieved an average accuracy of 57.33%. However, significant differences in accuracy were observed based on question difficulty, with lower accuracy for easier questions.
Currently, ChatGPT is not capable of replacing dentists in clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of endodontics. However, careful attention and ongoing evaluation are needed to ensure its accuracy, reliability and safety in endodontics.
聊天机器人生成式预训练变换器(ChatGPT)是一种基于大语言模型(LLM)的生成式人工智能(AI)软件,旨在模拟人类对话并根据其接触到的训练数据生成新颖内容。本研究的目的是将ChatGPT生成的关于牙髓病临床问题的答案与人类专家提供的答案进行比较,评估其一致性和准确性。
设计了91个二分法(是/否)问题,并分为三个难度级别。从每个难度级别中随机选择20个问题。ChatGPT为每个问题生成60个答案。两名牙髓病专家独立回答这60个问题。使用SPSS程序进行统计分析,以计算ChatGPT生成的答案与专家答案相比的一致性和准确性。使用置信区间(95%)和标准差来估计变异性。
ChatGPT生成的答案显示出高度一致性(85.44%)。根据问题难度,未发现一致性存在显著差异。在答案准确性方面,ChatGPT的平均准确率为57.33%。然而,根据问题难度观察到准确性存在显著差异,较简单问题的准确率较低。
目前,ChatGPT在临床决策中无法取代牙医。随着ChatGPT通过深度学习性能得到提升,预计它在牙髓病领域将变得更加有用和有效。然而,需要密切关注并持续评估,以确保其在牙髓病领域的准确性、可靠性和安全性。