Department of Periodontology, Necmettin Erbakan University Faculty of Dentistry, Beyşehir Caddesi, Bağlarbaşı Sk., 42090 Meram, Konya, Turkey.
Clin Oral Investig. 2024 Jun 29;28(7):407. doi: 10.1007/s00784-024-05799-9.
This study assessed the ability of ChatGPT, an artificial intelligence(AI) language model, to determine the stage, grade, and extent of periodontitis based on the 2018 classification.
This study used baseline digital data of 200 untreated periodontitis patients to compare standardized reference diagnoses (RDs) with ChatGPT findings and determine the best criteria for assessing stage and grade. RDs were provided by four experts who examined each case. Standardized texts containing the relevant information for each situation were constructed to query ChatGPT. RDs were compared to ChatGPT's responses. Variables influencing the responses of ChatGPT were evaluated.
ChatGPT successfully identified the periodontitis stage, grade, and extent in 59.5%, 50.5%, and 84.0% of cases, respectively. Cohen's kappa values for stage, grade and extent were respectively 0.447, 0.284, and 0.652. A multiple correspondence analysis showed high variance between ChatGPT's staging and the variables affecting the stage (64.08%) and low variance between ChatGPT's grading and the variables affecting the grade (42.71%).
The present performance of ChatGPT in the classification of periodontitis exhibited a reasonable level. However, it is expected that additional improvements would increase its effectiveness and broaden its range of functionalities (NCT05926999).
Despite ChatGPT's current limitations in accurately classifying periodontitis, it is important to note that the model has not been specifically trained for this task. However, it is expected that with additional improvements, the effectiveness and capabilities of ChatGPT might be enhanced.
本研究评估了人工智能(AI)语言模型 ChatGPT 根据 2018 年分类法确定牙周炎分期、分级和程度的能力。
本研究使用 200 例未经治疗的牙周炎患者的基线数字数据,将标准化参考诊断(RD)与 ChatGPT 结果进行比较,并确定评估分期和分级的最佳标准。RD 由四位专家提供,每位专家检查每个病例。构建包含每种情况相关信息的标准化文本以查询 ChatGPT。比较 RD 与 ChatGPT 的响应。评估影响 ChatGPT 响应的变量。
ChatGPT 分别成功识别了 59.5%、50.5%和 84.0%的牙周炎分期、分级和程度。分期、分级和程度的 Cohen's kappa 值分别为 0.447、0.284 和 0.652。多元对应分析显示,ChatGPT 的分期与影响分期的变量之间存在高度差异(64.08%),而 ChatGPT 的分级与影响分级的变量之间差异较小(42.71%)。
ChatGPT 在牙周炎分类中的目前表现具有合理水平。然而,预计进一步的改进将提高其有效性并扩大其功能范围(NCT05926999)。
尽管 ChatGPT 在准确分类牙周炎方面目前存在局限性,但需要注意的是,该模型尚未针对此任务进行专门训练。然而,预计随着进一步的改进,ChatGPT 的有效性和功能可能会得到增强。