H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia.
School of Computing and Augmented Intelligence Arizona State University, Tempe, Arizona.
J Endod. 2024 Oct;50(10):1505-1514.e1. doi: 10.1016/j.joen.2024.07.012. Epub 2024 Aug 2.
Cone-beam computed tomography (CBCT) is widely used to detect jaw lesions, although CBCT interpretation is time-consuming and challenging. Artificial intelligence for CBCT segmentation may improve lesion detection accuracy. However, consistent automated lesion detection remains difficult, especially with limited training data. This study aimed to assess the applicability of pretrained transformer-based architectures for semantic segmentation of CBCT volumes when applied to periapical lesion detection.
CBCT volumes (n = 138) were collected and annotated by expert clinicians using 5 labels - "lesion," "restorative material," "bone," "tooth structure," and "background." U-Net (convolutional neural network-based) and Swin-UNETR (transformer-based) models, pretrained (Swin-UNETR-PRETRAIN), and from scratch (Swin-UNETR-SCRATCH), were trained with subsets of the annotated CBCTs. These models were then evaluated for semantic segmentation performance using the Sørensen-Dice coefficient (DICE), lesion detection performance using sensitivity and specificity, and training sample size requirements by comparing models trained with 20, 40, 60, or 103 samples.
Trained with 103 samples, Swin-UNETR-PRETRAIN achieved a DICE of 0.8512 for "lesion," 0.8282 for "restorative materials," 0.9178 for "bone," 0.9029 for "tooth structure," and 0.9901 for "background." "Lesion" DICE was statistically similar between Swin-UNETR-PRETRAIN trained with 103 and 60 images (P > .05), with the latter achieving 1.00 sensitivity and 0.94 specificity in lesion detection. With small training sets, Swin-UNETR-PRETRAIN outperformed Swin-UNETR-SCRATCH in DICE over all labels (P < .001 [n = 20], P < .001 [n = 40]), and U-Net in lesion detection specificity (P = .006 [n = 20], P = .031 [n = 40]).
Transformer-based Swin-UNETR architectures allowed for excellent semantic segmentation and periapical lesion detection. Pretrained, it may provide an alternative with smaller training datasets compared to classic U-Net architectures.
锥形束计算机断层扫描(CBCT)广泛用于检测颌骨病变,尽管 CBCT 解读既耗时又具有挑战性。用于 CBCT 分割的人工智能可能会提高病变检测的准确性。然而,一致的自动病变检测仍然很困难,尤其是在训练数据有限的情况下。本研究旨在评估在应用于根尖病变检测时,用于 CBCT 容积语义分割的预训练基于变压器的体系结构的适用性。
收集 CBCT 容积(n=138),并由专家临床医生使用 5 个标签(“病变”、“修复材料”、“骨”、“牙结构”和“背景”)进行注释。使用标注的 CBCT 的子集对 U-Net(基于卷积神经网络)和 Swin-UNETR(基于变压器)模型(预训练的 Swin-UNETR-PRETRAIN)和从头开始(Swin-UNETR-SCRATCH)进行训练。然后,使用 Sørensen-Dice 系数(DICE)评估这些模型的语义分割性能,使用灵敏度和特异性评估病变检测性能,并通过比较使用 20、40、60 或 103 个样本训练的模型,评估训练样本大小的要求。
使用 103 个样本训练的 Swin-UNETR-PRETRAIN 在“病变”上的 DICE 为 0.8512,在“修复材料”上为 0.8282,在“骨”上为 0.9178,在“牙结构”上为 0.9029,在“背景”上为 0.9901。Swin-UNETR-PRETRAIN 用 103 和 60 张图像训练时,“病变”的 DICE 差异无统计学意义(P>.05),后者在病变检测中具有 1.00 的灵敏度和 0.94 的特异性。在小训练集的情况下,Swin-UNETR-PRETRAIN 在所有标签上的 DICE 均优于 Swin-UNETR-SCRATCH(P<.001[n=20],P<.001[n=40]),并且在病变检测特异性方面优于 U-Net(P=.006[n=20],P=.031[n=40])。
基于变压器的 Swin-UNETR 架构允许进行出色的语义分割和根尖病变检测。与经典的 U-Net 架构相比,使用预训练,它可以在较小的训练数据集上提供替代方案。