Wu Yongjia, Zhang Yun, Wu Yange, Zheng Qianhan, Li Xiaojun, Chen Xuepeng
Department of Orthodontics, Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Hangzhou, PR China.
Department of Orthodontics, Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Hangzhou, PR China.
J Dent. 2025 Jun;157:105755. doi: 10.1016/j.jdent.2025.105755. Epub 2025 Apr 12.
This study aims to propose a framework that integrates GPT-4V, a recent advanced version of ChatGPT, and multimodal pre-training techniques to enhance deep learning algorithms for 3-dimensional (3D) tooth segmentation in scans produced by intraoral scanners (IOSs).
The framework was developed on 1800 intraoral scans of approximately 24,000 annotated teeth (training set: 1200 scans, 16,004 teeth; testing set: 600 scans, 7995 teeth), from the Teeth3DS dataset, which was gathered from 900 patients with both maxillary and mandible regions. The first step of the proposed framework, ChatIOS, is to pre-process the 3D IOS data to extract 3D point clouds. Then, GPT-4V generates detailed descriptions of 2-dimensional (2D) IOS images taken from different view angles. In the multimodal pre-training, triplets, which comprise point clouds, 2D images, and text descriptions, serve as inputs. A series of ablation studies were systematically conducted to illustrate the superior design of the automatic 3D tooth segmentation system. Our quantitative evaluation criteria included segmentation quality, processing speed, and clinical applicability.
When tested on 600 scans, ChatIOS substantially outperformed the existing benchmarks such as PointNet++ across all metrics, including mean intersection-over-union (mIoU, from 90.3 % to 93.0 % for maxillary and from 89.2 % to 92.3 % for mandible scans), segmentation accuracy (from 97.0 % to 98.0 % for maxillary and from 96.8 % to 97.9 % for mandible scans) and dice similarity coefficient (DSC, from 98.1 % to 98.7 % for maxillary and from 97.9 % to 98.6 % for mandible scans). Our model took only approximately 2s to generate segmentation outputs per scan and exhibited acceptable consistency with clinical expert evaluations.
Our ChatIOS framework can increase the effectiveness and efficiency of 3D tooth segmentation that clinical procedures require, including orthodontic and prosthetic treatments. This study presents an early exploration of the applications of GPT-4V in digital dentistry and also pioneers the multimodal pre-training paradigm for 3D tooth segmentation.
Accurate segmentation of teeth on 3D intraoral scans is critical for orthodontic and prosthetic treatments. ChatIOS can integrate GPT-4V with pre-trained vision-language models (VLMs) to gain an in-depth understanding of IOS data, which can contribute to more efficient and precise tooth segmentation systems.
本研究旨在提出一个框架,该框架整合了ChatGPT的最新高级版本GPT-4V和多模态预训练技术,以增强深度学习算法,用于口腔内扫描仪(IOS)生成的扫描中的三维(3D)牙齿分割。
该框架是基于Teeth3DS数据集中的1800次口腔内扫描构建的,这些扫描包含约24000颗标注牙齿(训练集:1200次扫描,16004颗牙齿;测试集:600次扫描,7995颗牙齿),该数据集来自900名上颌和下颌区域均有的患者。所提出的框架ChatIOS的第一步是对3D IOS数据进行预处理,以提取3D点云。然后,GPT-4V生成从不同视角拍摄的二维(2D)IOS图像的详细描述。在多模态预训练中,由点云、2D图像和文本描述组成的三元组用作输入。系统地进行了一系列消融研究,以说明自动3D牙齿分割系统的卓越设计。我们的定量评估标准包括分割质量、处理速度和临床适用性。
在600次扫描上进行测试时,ChatIOS在所有指标上均显著优于现有的基准模型,如PointNet++,包括平均交并比(mIoU,上颌扫描从90.3%提高到93.0%,下颌扫描从89.2%提高到92.3%)、分割准确率(上颌扫描从97.0%提高到98.0%,下颌扫描从96.8%提高到97.9%)和骰子相似系数(DSC,上颌扫描从98.1%提高到98.7%,下颌扫描从97.9%提高到98.6%)。我们的模型每次扫描生成分割输出仅需约2秒,并且与临床专家评估表现出可接受的一致性。
我们的ChatIOS框架可以提高临床程序所需的3D牙齿分割的有效性和效率,包括正畸和修复治疗。本研究展示了GPT-4V在数字牙科应用方面的早期探索,也开创了3D牙齿分割的多模态预训练范式。
在3D口腔内扫描上准确分割牙齿对于正畸和修复治疗至关重要。ChatIOS可以将GPT-4V与预训练的视觉语言模型(VLM)集成,以深入理解IOS数据,这有助于构建更高效、精确的牙齿分割系统。