Ivanescu A E, Li P, George B, Brown A W, Keith S W, Raju D, Allison D B
Department of Mathematical Sciences, Montclair State University, Montclair, NJ, USA.
Office of Energetics and Nutrition Obesity Research Center, University of Alabama at Birmingham, Birmingham, AL, USA.
Int J Obes (Lond). 2016 Jun;40(6):887-94. doi: 10.1038/ijo.2015.214. Epub 2015 Oct 9.
Deriving statistical models to predict one variable from one or more other variables, or predictive modeling, is an important activity in obesity and nutrition research. To determine the quality of the model, it is necessary to quantify and report the predictive validity of the derived models. Conducting validation of the predictive measures provides essential information to the research community about the model. Unfortunately, many articles fail to account for the nearly inevitable reduction in predictive ability that occurs when a model derived on one data set is applied to a new data set. Under some circumstances, the predictive validity can be reduced to nearly zero. In this overview, we explain why reductions in predictive validity occur, define the metrics commonly used to estimate the predictive validity of a model (for example, coefficient of determination (R(2)), mean squared error, sensitivity, specificity, receiver operating characteristic and concordance index) and describe methods to estimate the predictive validity (for example, cross-validation, bootstrap, and adjusted and shrunken R(2)). We emphasize that methods for estimating the expected reduction in predictive ability of a model in new samples are available and this expected reduction should always be reported when new predictive models are introduced.
推导统计模型以从一个或多个其他变量预测一个变量,即预测建模,是肥胖与营养研究中的一项重要活动。为了确定模型的质量,有必要对所推导模型的预测有效性进行量化并报告。对预测指标进行验证可为研究界提供有关该模型的重要信息。不幸的是,许多文章未能考虑到当基于一个数据集推导的模型应用于新数据集时,预测能力几乎不可避免地会下降。在某些情况下,预测有效性可能会降至几乎为零。在本综述中,我们解释了预测有效性降低的原因,定义了常用于估计模型预测有效性的指标(例如,决定系数(R²)、均方误差、灵敏度、特异性、受试者工作特征曲线和一致性指数),并描述了估计预测有效性的方法(例如,交叉验证、自助法以及调整后的和收缩后的R²)。我们强调,有方法可用于估计新样本中模型预测能力的预期下降,并且在引入新的预测模型时应始终报告这种预期下降。