Department of Oral Diagnosis, Division of Oral Radiology, Piracicaba Dental School, University of Campinas, Piracicaba, Sao Paulo, 13414-903, Brazil.
Department of Oral Health & Medicine, University Center for Dental Medicine Basel UZB, University of Basel, Basel, 4058, Switzerland.
Clin Oral Investig. 2024 Mar 9;28(3):204. doi: 10.1007/s00784-024-05587-5.
To evaluate the performance of a commercially available Generative Pre-trained Transformer (GPT) in describing and establishing differential diagnoses for radiolucent lesions in panoramic radiographs.
Twenty-eight panoramic radiographs, each containing a single radiolucent lesion, were evaluated in consensus by three examiners and a commercially available ChatGPT-3.5 model. They provided descriptions regarding internal structure (radiodensity, loculation), periphery (margin type, cortication), shape, location (bone, side, region, teeth/structures), and effects on adjacent structures (effect, adjacent structure). Diagnostic impressions related to origin, behavior, and nature were also provided. The GPT program was additionally prompted to provide differential diagnoses. Keywords used by the GPT program were compared to those used by the examiners and scored as 0 (incorrect), 0.5 (partially correct), or 1 (correct). Mean score values and standard deviation were calculated for each description. Performance in establishing differential diagnoses was assessed using Rank-1, -2, and - 3.
Descriptions of margination, affected bone, and origin received the highest scores: 0.93, 0.93, and 0.87, respectively. Shape, region, teeth/structures, effect, affected region, and nature received considerably lower scores ranging from 0.22 to 0.50. Rank-1, -2, and - 3 demonstrated accuracy in 25%, 57.14%, and 67.85% of cases, respectively.
The performance of the GPT program in describing and providing differential diagnoses for radiolucent lesions in panoramic radiographs is variable and at this stage limited in its use for clinical application.
Understanding the potential role of GPT systems as an auxiliary tool in image interpretation is imperative to validate their clinical applicability.
评估一种商用生成式预训练转换器(GPT)在描述和建立全景放射片中透明性病变的鉴别诊断方面的性能。
由三名检查者和一种商用 ChatGPT-3.5 模型对 28 张全景放射片中的每一张均含单个透明性病变的放射片进行了共识评估。他们提供了关于内部结构(放射密度、分隔)、周边(边缘类型、皮质)、形状、位置(骨、侧、区域、牙齿/结构)和对相邻结构的影响(影响、相邻结构)的描述。还提供了与起源、行为和性质有关的诊断印象。GPT 程序还被提示提供鉴别诊断。比较 GPT 程序使用的关键词与检查者使用的关键词,并评为 0(不正确)、0.5(部分正确)或 1(正确)。为每个描述计算平均得分值和标准差。使用排名 1、2 和 3 来评估建立鉴别诊断的性能。
边缘、受影响的骨骼和起源的描述获得了最高的分数:分别为 0.93、0.93 和 0.87。形状、区域、牙齿/结构、影响、受影响的区域和性质的分数则低得多,范围从 0.22 到 0.50。排名 1、2 和 3 分别在 25%、57.14%和 67.85%的病例中具有准确性。
GPT 程序在描述和提供全景放射片中透明性病变的鉴别诊断方面的性能是可变的,在现阶段,其在临床应用中的应用受到限制。
了解 GPT 系统作为图像解释辅助工具的潜在作用对于验证其临床适用性至关重要。