Kula Betul, Kula Ahmet, Bagcier Fatih, Alyanak Bulent
Department of Orthodontics, Istanbul Galata University, Istanbul, Türkiye.
Department of Prosthodontics, Uskudar University, Istanbul, Türkiye.
Korean J Orthod. 2025 Mar 25;55(2):131-141. doi: 10.4041/kjod24.106. Epub 2024 Dec 11.
This study aimed to evaluate the reliability and usefulness of information generated by Chat Generative Pre-Trained Transformer (ChatGPT) on temporomandibular joint disorders (TMD).
We asked ChatGPT about the diseases specified in the TMD classification and scored the responses using Likert reliability and usefulness scales, the modified DISCERN (mDISCERN) scale, and the Global Quality Scale (GQS).
The highest Likert scores for both reliability and usefulness were for masticatory muscle disorders (mean ± standard deviation [SD]: 6.0 ± 0), and the lowest scores were for inflammatory disorders of the temporomandibular joint (mean ± SD: 4.3 ± 0.6 for reliability, 4.0 ± 0 for usefulness). The median Likert reliability score indicates that the responses are highly reliable. The median Likert usefulness score was 5 (4-6), indicating that the responses were moderately useful. A comparative analysis was performed, and no statistically significant differences were found in any subject for either reliability or usefulness ( = 0.083-1.000). The median mDISCERN score was 4 (3-5) for the two raters. A statistically significant difference was observed in the mean mDISCERN scores between the two raters ( = 0.046). The GQS scores indicated a moderate to high quality (mean ± SD: 3.8 ± 0.8 for rater 1, 4.0 ± 0.5 for rater 2). No statistically significant correlation was found between mDISCERN and GQS scores (r = -0.006, = 0.980).
Although ChatGPT-4 has significant potential, it can be used as an additional source of information regarding TMD for patients and clinicians.
本研究旨在评估聊天生成预训练变换器(ChatGPT)生成的关于颞下颌关节紊乱病(TMD)信息的可靠性和实用性。
我们向ChatGPT询问TMD分类中指定的疾病,并使用李克特可靠性和实用性量表、改良的辨别力(mDISCERN)量表和全球质量量表(GQS)对回答进行评分。
咀嚼肌紊乱的李克特可靠性和实用性评分最高(平均值±标准差[SD]:6.0±0),颞下颌关节炎性疾病的评分最低(可靠性:平均值±SD为4.3±0.6,实用性:平均值±SD为4.0±0)。李克特可靠性中位数评分表明回答高度可靠。李克特实用性中位数评分为5(4 - 6),表明回答具有中等实用性。进行了比较分析,在任何主题的可靠性或实用性方面均未发现统计学上的显著差异(P = 0.083 - 1.000)。两位评估者的mDISCERN中位数评分为4(3 - 5)。在两位评估者之间观察到mDISCERN平均评分存在统计学上的显著差异(P = 0.046)。GQS评分表明质量为中等至高(评估者1:平均值±SD为3.8±0.8,评估者2:平均值±SD为4.0±0.5)。mDISCERN和GQS评分之间未发现统计学上的显著相关性(r = -0.006,P = 0.980)。
尽管ChatGPT - 4具有巨大潜力,但它可作为患者和临床医生关于TMD的额外信息来源。