1 Department of Operative and Preventive Dentistry, Charité-Universitätsmedizin Berlin, Berlin, Germany.
2 Clinic of Conservative Dentistry and Periodontology, University of Kiel, Kiel, Germany.
J Dent Res. 2019 Sep;98(10):1088-1095. doi: 10.1177/0022034519864889. Epub 2019 Jul 30.
Prediction models learn patterns from available data (training) and are then validated on new data (testing). Prediction modeling is increasingly common in dental research. We aimed to evaluate how different model development and validation steps affect the predictive performance of tooth loss prediction models of patients with periodontitis. Two independent cohorts (627 patients, 11,651 teeth) were followed over a mean ± SD 18.2 ± 5.6 y (Kiel cohort) and 6.6 ± 2.9 y (Greifswald cohort). Tooth loss and 10 patient- and tooth-level predictors were recorded. The impact of different model development and validation steps was evaluated: 1) model complexity (logistic regression, recursive partitioning, random forest, extreme gradient boosting), 2) sample size (full data set or 10%, 25%, or 75% of cases dropped at random), 3) prediction periods (maximum 10, 15, or 20 y or uncensored), and 4) validation schemes (internal or external by centers/time). Tooth loss was generally a rare event (880 teeth were lost). All models showed limited sensitivity but high specificity. Patients' age and tooth loss at baseline as well as probing pocket depths showed high variable importance. More complex models (random forest, extreme gradient boosting) had no consistent advantages over simpler ones (logistic regression, recursive partitioning). Internal validation (in sample) overestimated the predictive power (area under the curve up to 0.90), while external validation (out of sample) found lower areas under the curve (range 0.62 to 0.82). Reducing the sample size decreased the predictive power, particularly for more complex models. Censoring the prediction period had only limited impact. When the model was trained in one period and tested in another, model outcomes were similar to the base case, indicating temporal validation as a valid option. No model showed higher accuracy than the no-information rate. In conclusion, none of the developed models would be useful in a clinical setting, despite high accuracy. During modeling, rigorous development and external validation should be applied and reported accordingly.
预测模型从可用数据(训练)中学习模式,然后在新数据(测试)上进行验证。预测建模在牙科研究中越来越常见。我们旨在评估不同的模型开发和验证步骤如何影响牙周炎患者牙齿缺失预测模型的预测性能。两个独立的队列(627 名患者,11651 颗牙齿)随访了平均±SD 18.2±5.6 年(基尔队列)和 6.6±2.9 年(格雷夫斯瓦尔德队列)。记录了牙齿缺失和 10 个患者和牙齿水平的预测因子。评估了不同模型开发和验证步骤的影响:1)模型复杂性(逻辑回归、递归分区、随机森林、极端梯度增强),2)样本量(完整数据集或随机丢弃 10%、25%或 75%的病例),3)预测期(最长 10、15 或 20 年或未删失),以及 4)验证方案(内部或外部按中心/时间)。牙齿缺失通常是一个罕见事件(880 颗牙齿缺失)。所有模型的敏感性均有限,但特异性较高。患者的年龄和基线时的牙齿缺失以及探诊袋深度显示出较高的变量重要性。更复杂的模型(随机森林、极端梯度增强)并没有比简单的模型(逻辑回归、递归分区)具有一致的优势。内部验证(样本内)高估了预测能力(曲线下面积高达 0.90),而外部验证(样本外)发现曲线下面积较低(范围为 0.62 至 0.82)。缩小样本量会降低预测能力,特别是对于更复杂的模型。对预测期进行删失的影响有限。当模型在一个时期内训练并在另一个时期内测试时,模型结果与基础案例相似,表明时间验证是一种有效的选择。没有模型比无信息率显示出更高的准确性。总之,尽管准确性很高,但开发的模型中没有一个在临床环境中有用。在建模过程中,应严格进行开发和外部验证,并相应地进行报告。