Holtkamp Agnes, Elhennawy Karim, Cejudo Grano de Oro José E, Krois Joachim, Paris Sebastian, Schwendicke Falk
Department of Oral Diagnostics, Digital Health and Health Services Research, Charité-Universitätsmedizin Berlin, 14197 Berlin, Germany.
Department of Operative and Preventive Dentistry, Charité-Universitätsmedizin Berlin, 14197 Berlin, Germany.
J Clin Med. 2021 Mar 1;10(5):961. doi: 10.3390/jcm10050961.
The present study aimed to train deep convolutional neural networks (CNNs) to detect caries lesions on Near-Infrared Light Transillumination (NILT) imagery obtained either in vitro or in vivo and to assess the models' generalizability.
In vitro, 226 extracted posterior permanent human teeth were mounted in a diagnostic model in a dummy head. Then, NILT images were generated (DIAGNOcam, KaVo, Biberach), and images were segmented tooth-wise. In vivo, 1319 teeth from 56 patients were obtained and segmented similarly. Proximal caries lesions were annotated pixel-wise by three experienced dentists, reviewed by a fourth dentist, and then transformed into binary labels. We trained ResNet classification models on both in vivo and in vitro datasets and used 10-fold cross-validation for estimating the performance and generalizability of the models. We used GradCAM to increase explainability.
The tooth-level prevalence of caries lesions was 41% in vitro and 49% in vivo, respectively. Models trained and tested on in vivo data performed significantly better (mean ± SD accuracy: 0.78 ± 0.04) than those trained and tested on in vitro data (accuracy: 0.64 ± 0.15; < 0.05). When tested in vitro, the models trained in vivo showed significantly lower accuracy (0.70 ± 0.01; < 0.01). Similarly, when tested in vivo, models trained in vitro showed significantly lower accuracy (0.61 ± 0.04; < 0.05). In both cases, this was due to decreases in sensitivity (by -27% for models trained in vivo and -10% for models trained in vitro).
Using in vitro setups for generating NILT imagery and training CNNs comes with low accuracy and generalizability.
Studies employing in vitro imagery for developing deep learning models should be critically appraised for their generalizability. Applicable deep learning models for assessing NILT imagery should be trained on in vivo data.
本研究旨在训练深度卷积神经网络(CNN),以检测在体外或体内获得的近红外光透照(NILT)图像上的龋损,并评估模型的通用性。
在体外,将226颗拔除的人类后恒牙安装在模拟头的诊断模型中。然后,生成NILT图像(DIAGNOcam,卡瓦,比伯拉赫),并按牙齿进行图像分割。在体内,从56名患者中获取1319颗牙齿并进行类似的分割。由三名经验丰富的牙医逐像素标注邻面龋损,由第四名牙医进行审核,然后转换为二元标签。我们在体内和体外数据集上训练ResNet分类模型,并使用10折交叉验证来估计模型的性能和通用性。我们使用GradCAM来提高可解释性。
龋损的牙齿水平患病率在体外为41%,在体内为49%。在体内数据上训练和测试的模型表现明显优于在体外数据上训练和测试的模型(平均±标准差准确率:0.78±0.04)(准确率:0.64±0.15;<0.05)。在体外测试时,在体内训练的模型准确率显著较低(0.70±0.01;<0.01)。同样,在体内测试时,在体外训练的模型准确率显著较低(0.61±0.04;<0.05)。在这两种情况下,这都是由于敏感性降低(在体内训练的模型降低了27%,在体外训练的模型降低了10%)。
使用体外设置生成NILT图像并训练CNN的准确性和通用性较低。
对于采用体外图像开发深度学习模型的研究,应严格评估其通用性。适用于评估NILT图像的深度学习模型应在体内数据上进行训练。