Lakhotia Shrey, Godrej Hormazd, Kaur Amandeep, Nutakki Chaitanya Sai, Mun Michelle, Eber Pascal, Anthony Celi Leo
Helios Enter Data Warehouse IT Exp., Henry Ford Health System, Detroit, Michigan, United States of America.
Independent Researcher, Mumbai, India.
PLOS Digit Health. 2025 Jul 23;4(7):e0000940. doi: 10.1371/journal.pdig.0000940. eCollection 2025 Jul.
Artificial intelligence (AI), specifically machine learning (ML), is increasingly applied in decision-making for dental diagnosis, prognosis, and treatment. However, the methodological completeness of published models has not been rigorously assessed. We performed a scoping review of PubMed-indexed articles (English, 1 January 2018â€'31 December 2023) that used ML in any dental specialty. Each study was evaluated with the TRIPOD + AI rubric for key reporting elements such as data preprocessing, model validation, and clinical performance. Out of 1,506 identified studies, 280 met the inclusion criteria. Oral and maxillofacial radiology (27.5%), oral and maxillofacial surgery (15.0%), and general dentistry (14.3%) were the most represented specialties. Sixty-four studies (22.9%) lacked comparison with a clinical reference standard or existing model performing the same task. Most models focused on classification (59.6%), whereas generative applications were relatively rare (1.4%). Key gaps included limited assessment of model bias, poor outlier reporting, scarce calibration evaluation, low reproducibility, and restricted data access. ML could transform dental care, but robust calibration assessment and equity evaluation are critical for real-world adoption. Future research should prioritize error explainability, outlier reporting, reproducibility, fairness, and prospective validation.
人工智能(AI),特别是机器学习(ML),在牙科诊断、预后和治疗的决策中应用越来越广泛。然而,已发表模型的方法完整性尚未得到严格评估。我们对PubMed索引的文章(英文,2018年1月1日至2023年12月31日)进行了一项范围综述,这些文章在任何牙科专业中使用了ML。每项研究都根据TRIPOD + AI准则对数据预处理、模型验证和临床性能等关键报告要素进行了评估。在1506项已识别的研究中,280项符合纳入标准。口腔颌面放射学(27.5%)、口腔颌面外科(15.0%)和普通牙科(14.3%)是代表性最强的专业。64项研究(22.9%)缺乏与临床参考标准或执行相同任务的现有模型的比较。大多数模型专注于分类(59.6%),而生成式应用相对较少(1.4%)。关键差距包括对模型偏差的评估有限、异常值报告不佳、校准评估不足、可重复性低以及数据访问受限。ML可以改变牙科护理,但强大的校准评估和公平性评估对于在现实世界中的应用至关重要。未来的研究应优先考虑误差可解释性、异常值报告、可重复性、公平性和前瞻性验证。