Uribe S E, Issa J, Sohrabniya F, Denny A, Kim N N, Dayo A F, Chaurasia A, Sofi-Mahmudi A, Büttner M, Schwendicke F
Department of Conservative Dentistry and Oral Health, Riga Stradins University, Riga, Latvia.
Baltic Biomaterials Centre of Excellence, Headquarters at Riga Technical University, Riga, Latvia.
J Dent Res. 2024 Dec;103(13):1365-1374. doi: 10.1177/00220345241272052. Epub 2024 Oct 18.
The development of artificial intelligence (AI) in dentistry requires large and well-annotated datasets. However, the availability of public dental imaging datasets remains unclear. This study aimed to provide a comprehensive overview of all publicly available dental imaging datasets to address this gap and support AI development. This observational study searched all publicly available dataset resources (academic databases, preprints, and AI challenges), focusing on datasets/articles from 2020 to 2023, with PubMed searches extending back to 2011. We comprehensively searched for dental AI datasets containing images (intraoral photos, scans, radiographs, etc.) using relevant keywords. We included datasets of >50 images obtained from publicly available sources. We extracted dataset characteristics, patient demographics, country of origin, dataset size, ethical clearance, image details, FAIRness metrics, and metadata completeness. We screened 131,028 records and extracted 16 unique dental imaging datasets. The datasets were obtained from Kaggle (18.8%), GitHub, Google, Mendeley, PubMed, Zenodo (each 12.5%), Grand-Challenge, OSF, and arXiv (each 6.25%). The primary focus was tooth segmentation (62.5%) and labeling (56.2%). Panoramic radiography was the most common imaging modality (58.8%). Of the 13 countries, China contributed the most images (2,413). Of the datasets, 75% contained annotations, whereas the methods used to establish labels were often unclear and inconsistent. Only 31.2% of the datasets reported ethical approval, and 56.25% did not specify a license. Most data were obtained from dental clinics (50%). Intraoral radiographs had the highest findability score in the FAIR assessment, whereas cone-beam computed tomography datasets scored the lowest in all categories. These findings revealed a scarcity of publicly available imaging dental data and inconsistent metadata reporting. To promote the development of robust, equitable, and generalizable AI tools for dental diagnostics, treatment, and research, efforts are needed to address data scarcity, increase diversity, mandate metadata completeness, and ensure FAIRness in AI dental imaging research.
牙科领域人工智能(AI)的发展需要大量且标注良好的数据集。然而,公开可用的牙科影像数据集的可用性仍不明确。本研究旨在全面概述所有公开可用的牙科影像数据集,以填补这一空白并支持人工智能的发展。这项观察性研究搜索了所有公开可用的数据集资源(学术数据库、预印本和人工智能挑战赛),重点关注2020年至2023年的数据集/文章,PubMed搜索可追溯到2011年。我们使用相关关键词全面搜索了包含图像(口腔内照片、扫描图、X光片等)的牙科人工智能数据集。我们纳入了从公开来源获得的超过50张图像的数据集。我们提取了数据集特征、患者人口统计学信息、原产国、数据集大小、伦理批准、图像细节、FAIRness指标和元数据完整性。我们筛选了131,028条记录,提取了16个独特的牙科影像数据集。这些数据集来自Kaggle(18.8%)、GitHub、谷歌、Mendeley、PubMed、Zenodo(各占12.5%)、Grand-Challenge、OSF和arXiv(各占6.25%)。主要重点是牙齿分割(62.5%)和标注(56.2%)。全景X光片是最常见的成像方式(58.8%)。在13个国家中,中国贡献的图像最多(2413张)。在这些数据集中,75%包含标注,而用于建立标签的方法往往不明确且不一致。只有31.2%的数据集报告了伦理批准,56.25%未指定许可。大多数数据来自牙科诊所(50%)。口腔内X光片在FAIR评估中的可发现性得分最高,而锥形束计算机断层扫描数据集在所有类别中的得分最低。这些发现揭示了公开可用的牙科影像数据稀缺以及元数据报告不一致的问题。为了促进用于牙科诊断、治疗和研究的强大、公平且可推广的人工智能工具的发展,需要努力解决数据稀缺问题、增加数据多样性、强制要求元数据完整性,并确保人工智能牙科影像研究的FAIRness。