Freire Yolanda, Santamaría Laorden Andrea, Orejas Pérez Jaime, Ortiz Collado Ignacio, Gómez Sánchez Margarita, Thuissard Vasallo Israel J, Díaz-Flores García Víctor, Suárez Ana
Department of Preclinical Dentistry II, Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, Madrid, Spain.
Department of Preclinical Dentistry I. Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, Madrid, Spain.
PLoS One. 2025 May 30;20(5):e0323086. doi: 10.1371/journal.pone.0323086. eCollection 2025.
Language models (LLMs) such as ChatGPT are widely available to any dental professional. However, there is limited evidence to evaluate the reliability and reproducibility of ChatGPT-4 in relation to implant-supported prostheses, as well as the impact of prompt design on its responses. This constrains understanding of its application within this specific area of dentistry. The purpose of this study was to evaluate the performance of ChatGPT-4 in generating answers about implant-supported prostheses using different prompts. Thirty questions on implant-supported and implant-retained prostheses were posed, with 30 answers generated per question using general and specific prompts, totaling 1800 answers. Experts assessed reliability (agreement with expert grading) and repeatability (response consistency) using a 3-point Likert scale. General prompts achieved 70.89% reliability, with repeatability ranging from moderate to almost perfect. Specific prompts showed higher performance, with 78.8% reliability and substantial to almost perfect repeatability. The specific prompt significantly improved reliability compared to the general prompt. Despite these promising results, ChatGPT's ability to generate reliable answers on implant-supported prostheses remains limited, highlighting the need for professional oversight. Using specific prompts can enhance its performance. The use of a specific prompt might improve the answer generation performance of ChatGPT.
诸如ChatGPT之类的语言模型(LLMs)已为广大牙科专业人员所使用。然而,关于ChatGPT-4在种植体支持的修复体方面的可靠性和可重复性,以及提示设计对其回答的影响,目前仅有有限的证据可供评估。这限制了我们对其在牙科这一特定领域应用的理解。本研究的目的是评估ChatGPT-4使用不同提示生成关于种植体支持修复体答案的性能。提出了30个关于种植体支持和种植体固位修复体的问题,每个问题使用一般提示和特定提示生成30个答案,共计1800个答案。专家们使用3点李克特量表评估可靠性(与专家评分的一致性)和可重复性(回答的一致性)。一般提示的可靠性达到70.89%,可重复性从中度到几乎完美不等。特定提示表现出更高的性能,可靠性为78.8%,可重复性从实质到几乎完美。与一般提示相比,特定提示显著提高了可靠性。尽管有这些令人鼓舞的结果,但ChatGPT在生成关于种植体支持修复体的可靠答案方面的能力仍然有限,这突出了专业监督的必要性。使用特定提示可以提高其性能。特定提示的使用可能会改善ChatGPT的答案生成性能。