Donatelli Richard E, Lee Shin-Jae
Clinical assistant professor, Department of Orthodontics, College of Dentistry, University of Florida, Gainesville, Fla.
Professor and chair, Department of Orthodontics and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, Korea.
Am J Orthod Dentofacial Orthop. 2015 Feb;147(2):272-9. doi: 10.1016/j.ajodo.2014.09.021.
The data used to test the validity of a prediction method should be different from the data used to generate the prediction model. In this study, we explored whether an independent data set is mandatory for testing the validity of a new prediction method and how validity can be tested without independent new data.
Several validation methods were compared in an example using the data from a mixed dentition analysis with a regression model. The validation errors of real mixed dentition analysis data and simulation data were analyzed for increasingly large data sets.
The validation results of both the real and the simulation studies demonstrated that the leave-1-out cross-validation method had the smallest errors. The largest errors occurred in the traditional simple validation method. The differences between the validation methods diminished as the sample size increased.
The leave-1-out cross-validation method seems to be an optimal validation method for improving the prediction accuracy in a data set with limited sample sizes.
用于检验预测方法有效性的数据应与用于生成预测模型的数据不同。在本研究中,我们探讨了独立数据集对于检验新预测方法有效性是否必不可少,以及如何在没有独立新数据的情况下检验有效性。
在一个使用混合牙列分析数据和回归模型的示例中,比较了几种验证方法。针对越来越大的数据集,分析了真实混合牙列分析数据和模拟数据的验证误差。
真实研究和模拟研究的验证结果均表明,留一法交叉验证方法的误差最小。传统的简单验证方法误差最大。随着样本量增加,验证方法之间的差异减小。
对于在样本量有限的数据集中提高预测准确性而言,留一法交叉验证方法似乎是一种最佳验证方法。