Wei Fangqiao, Wu Zailong, Li Guanghui, Sun Xiangyu, Shi Xiangru, Tan Lei, Ai Tianxiang, Qu Long, Zheng Shuguo
Department of Preventive Dentistry, Peking University School and Hospital of Stomatology & National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, PR China.
China Telecom eSurfing Cloud, Dongcheng District, Beijing, PR China.
BMC Oral Health. 2025 Jul 17;25(1):1188. doi: 10.1186/s12903-025-06590-2.
Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.
We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.
The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.
The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.
口腔微生物群是龋齿发生发展的主要病因。下一代测序技术已被广泛应用,产生了大量未被充分探索的数据。人工智能(AI)技术的进步使得从这些大型数据集中挖掘信息成为可能。本研究旨在开发人工智能驱动的诊断模型,并识别龋齿的关键微生物特征。
我们从之前关于唾液和牙菌斑的研究中收集了原始宏基因组和全长16S rRNA基因测序数据,以构建一个包含近600个样本的龋齿人工智能训练数据集。样本根据年龄、测序和采样方法进行分组。通过对逻辑回归、随机森林、支持向量机、梯度提升、卷积神经网络、前馈神经网络和Transformer模型等七种机器学习架构的系统比较,我们开发了亚组特异性龋齿诊断模型,随后进行集成学习整合以提高泛化能力。
龋齿诊断模型在6岁以下儿童的唾液和牙菌斑组中均达到了最大AUC值为1(准确率为100%)。通过组内和组间分析证明了对模型有贡献的顶级特征(物种和代谢途径)的一致性。与龋齿相关的关键物种包括唾液链球菌、副血链球菌和殊异韦荣球菌。小韦荣球菌在龋齿牙菌斑样本中的丰度较高,而在健康唾液样本中升高。香叶基香叶基二磷酸和果聚糖生物合成等代谢途径在龋齿中富集,而双歧杆菌分流和肽聚糖生物合成则减少。
目前的工作为幼儿龋齿提供了可靠的诊断模型,并建立了一个强大的人工智能驱动的微生物组分析计算框架。本研究通过关注口腔微生物组的特征,为通过应用人工智能建模进行数据挖掘和现有数据验证提供了新的视角。