揭开ChatGPT现象的面纱：评估牙髓病学问题答案的一致性和准确性。

Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers.

作者信息

Suárez Ana, Díaz-Flores García Víctor, Algar Juan, Gómez Sánchez Margarita, Llorente de Pedro María, Freire Yolanda

机构信息

Department of Pre-Clinic Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain.

Department of Clinical Dentistry, School of Biomedical Sciences, Universidad Europea de Madrid, Madrid, Spain.

出版信息

Int Endod J. 2024 Jan;57(1):108-113. doi: 10.1111/iej.13985. Epub 2023 Oct 9.

DOI:10.1111/iej.13985

PMID:37814369

Abstract

AIM

Chatbot Generative Pre-trained Transformer (ChatGPT) is a generative artificial intelligence (AI) software based on large language models (LLMs), designed to simulate human conversations and generate novel content based on the training data it has been exposed to. The aim of this study was to evaluate the consistency and accuracy of ChatGPT-generated answers to clinical questions in endodontics, compared to answers provided by human experts.

METHODOLOGY

Ninety-one dichotomous (yes/no) questions were designed and categorized into three levels of difficulty. Twenty questions were randomly selected from each difficulty level. Sixty answers were generated by ChatGPT for each question. Two endodontic experts independently answered the 60 questions. Statistical analysis was performed using the SPSS program to calculate the consistency and accuracy of the answers generated by ChatGPT compared to the experts. Confidence intervals (95%) and standard deviations were used to estimate variability.

RESULTS

The answers generated by ChatGPT showed high consistency (85.44%). No significant differences in consistency were found based on question difficulty. In terms of answer accuracy, ChatGPT achieved an average accuracy of 57.33%. However, significant differences in accuracy were observed based on question difficulty, with lower accuracy for easier questions.

CONCLUSIONS

Currently, ChatGPT is not capable of replacing dentists in clinical decision-making. As ChatGPT's performance improves through deep learning, it is expected to become more useful and effective in the field of endodontics. However, careful attention and ongoing evaluation are needed to ensure its accuracy, reliability and safety in endodontics.

摘要

目的

聊天机器人生成式预训练变换器（ChatGPT）是一种基于大语言模型（LLM）的生成式人工智能（AI）软件，旨在模拟人类对话并根据其接触到的训练数据生成新颖内容。本研究的目的是将ChatGPT生成的关于牙髓病临床问题的答案与人类专家提供的答案进行比较，评估其一致性和准确性。

方法

设计了91个二分法（是/否）问题，并分为三个难度级别。从每个难度级别中随机选择20个问题。ChatGPT为每个问题生成60个答案。两名牙髓病专家独立回答这60个问题。使用SPSS程序进行统计分析，以计算ChatGPT生成的答案与专家答案相比的一致性和准确性。使用置信区间（95%）和标准差来估计变异性。