为牙科放射学中的人工智能下游任务做准备：深度学习模型的基线性能比较

Preparing for downstream tasks in artificial intelligence for dental radiology: a baseline performance comparison of deep learning models.

作者信息

Fernandes Fara A, Ge Mouzhi, Chaltikyan Georgi, Gerdes Martin W, Omlin Christian W

机构信息

Department of Information and Communication Technology, University of Agder (UiA), 4879 Grimstad, Norway.

Faculty European Campus Rottal-Inn, Deggendorf Institute of Technology (DIT), 84347 Pfarrkirchen, Germany.

出版信息

Dentomaxillofac Radiol. 2025 Feb 1;54(2):149-162. doi: 10.1093/dmfr/twae056.

DOI:10.1093/dmfr/twae056

PMID:39563402

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11784916/

Abstract

OBJECTIVES

To compare the performance of the convolutional neural network (CNN) with the vision transformer (ViT), and the gated multilayer perceptron (gMLP) in the classification of radiographic images of dental structures.

METHODS

Retrospectively collected two-dimensional images derived from cone beam computed tomographic volumes were used to train CNN, ViT, and gMLP architectures as classifiers for four different cases. Cases selected for training the architectures were the classification of the radiographic appearance of maxillary sinuses, maxillary and mandibular incisors, the presence or absence of the mental foramen, and the positional relationship of the mandibular third molar to the inferior alveolar nerve canal. The performance metrics (sensitivity, specificity, precision, accuracy, and f1-score) and area under the curve (AUC)-receiver operating characteristic and precision-recall curves were calculated.

RESULTS

The ViT with an accuracy of 0.74-0.98, performed on par with the CNN model (accuracy 0.71-0.99) in all tasks. The gMLP displayed marginally lower performance (accuracy 0.65-0.98) as compared to the CNN and ViT. For certain tasks, the ViT outperformed the CNN. The AUCs ranged from 0.77 to 1.00 (CNN), 0.80 to 1.00 (ViT), and 0.73 to 1.00 (gMLP) for all of the four cases.

CONCLUSIONS

The ViT and gMLP exhibited comparable performance with the CNN (the current state-of-the-art). However, for certain tasks, there was a significant difference in the performance of the ViT and gMLP when compared to the CNN. This difference in model performance for various tasks proves that the capabilities of different architectures may be leveraged.

摘要

目的

比较卷积神经网络（CNN）、视觉Transformer（ViT）和门控多层感知器（gMLP）在牙科结构放射图像分类中的性能。

方法

回顾性收集来自锥形束计算机断层扫描容积的二维图像，用于训练CNN、ViT和gMLP架构，作为四种不同病例的分类器。选择用于训练架构的病例包括上颌窦、上颌和下颌切牙的放射影像外观分类、颏孔的有无以及下颌第三磨牙与下牙槽神经管的位置关系。计算性能指标（敏感性、特异性、精确性、准确性和F1分数）以及曲线下面积（AUC）-受试者操作特征曲线和精确召回率曲线。

结果

ViT的准确率为0.74-0.98，在所有任务中的表现与CNN模型（准确率0.71-0.99）相当。与CNN和ViT相比，gMLP的性能略低（准确率0.65-0.98）。在某些任务中，ViT的表现优于CNN。所有四种情况的AUC范围为0.77至1.00（CNN）、0.80至1.00（ViT）和0.73至1.00（gMLP）。