Camlet Albert, Kusiak Aida, Ossowska Agata, Świetlik Dariusz
Department of Periodontology and Oral Mucosa Diseases, Medical University of Gdansk, Orzeszkowej 18 St., 80-208 Gdansk, Poland.
Division of Biostatistics and Neural Networks, Medical University of Gdansk, Debinki 1 St., 80-211 Gdansk, Poland.
Diagnostics (Basel). 2025 Jul 23;15(15):1851. doi: 10.3390/diagnostics15151851.
: Periodontitis is a multifactorial disease leading to the loss of clinical attachment and alveolar bone. The diagnosis of periodontitis involves a clinical examination and radiographic evaluation, including panoramic images. Panoramic radiographs are cost-effective methods widely used in periodontitis classification. The remaining bone height (RBH) is a parameter used to assess the alveolar bone level. Large language models are widely utilized in the medical sciences. ChatGPT, the leading conversational model, has recently been extended to process visual data. The aim of this study was to assess the effectiveness of the ChatGPT models 4.5, o1, o3 and o4-mini-high in RBH measurement and tooth counts in relation to dental professionals' evaluations. : The analysis was based on 10 panoramic images, from which 252, 251, 246 and 271 approximal sites were qualified for the RBH measurement (using the models 4.5, o1, o3 and o4-mini-high, respectively). Three examiners were asked to independently evaluate the RBH in approximal sites, while the tooth count was achieved by consensus. Subsequently, the results were compared with the ChatGPT outputs. : ChatGPT 4.5, ChatGPT o3 and ChatGPT o4-mini-high achieved substantial agreement with clinicians in the assessment of tooth counts (κ = 0.65, κ = 0.66, κ = 0.69, respectively), while ChatGPT o1 achieved moderate agreement (κ = 0.52). In the context of RBH values, the ChatGPT models consistently exhibited a positive mean bias compared with the clinicians. ChatGPT 4.5 was reported to provide the lowest bias (+12 percentage points (pp) for the distal surfaces, width of the 95% CI for limits of agreement (LoAs) ~60 pp; +11 pp for the mesial surfaces, LoA width ~54 pp). : ChatGPT 4.5 and ChatGPT o3 show potential in the assessment of tooth counts on a panoramic radiograph; however, their present level of accuracy is insufficient for clinical use. In the current stage of development, the ChatGPT models substantially overestimated the RBH values; therefore, they are not applicable for classifying periodontal disease.
牙周炎是一种导致临床附着丧失和牙槽骨吸收的多因素疾病。牙周炎的诊断包括临床检查和影像学评估,其中包括全景图像。全景X线片是广泛用于牙周炎分类的经济有效的方法。剩余骨高度(RBH)是用于评估牙槽骨水平的一个参数。大语言模型在医学科学中被广泛应用。领先的对话模型ChatGPT最近已扩展到可处理视觉数据。本研究的目的是评估ChatGPT 4.5、o1、o3和o4-mini-high模型在测量RBH和牙齿计数方面相对于牙科专业人员评估的有效性。
该分析基于10张全景图像,从中分别有252、251、246和271个邻面部位符合RBH测量条件(分别使用4.5、o1、o3和o4-mini-high模型)。三名检查人员被要求独立评估邻面部位的RBH,而牙齿计数则通过协商达成一致。随后,将结果与ChatGPT的输出进行比较。
ChatGPT 4.5、ChatGPT o3和ChatGPT o4-mini-high在牙齿计数评估方面与临床医生达成了实质性一致(κ分别为0.65、0.66、0.69),而ChatGPT o1达成了中度一致(κ = 0.52)。在RBH值方面,与临床医生相比,ChatGPT模型始终表现出正的平均偏差。据报道,ChatGPT 4.5的偏差最小(远中面为+12个百分点(pp),一致性界限(LoA)的95%CI宽度约为60 pp;近中面为+11 pp,LoA宽度约为54 pp)。
ChatGPT 4.5和ChatGPT o3在全景X线片上的牙齿计数评估中显示出潜力;然而,它们目前的准确性水平不足以用于临床。在当前的发展阶段,ChatGPT模型大幅高估了RBH值;因此,它们不适用于牙周疾病的分类。