Winham Stacey J, Motsinger-Reif Alison A
Department of Statistics, North Carolina State University, Raleigh, 27695, USA.
Ann Hum Genet. 2011 Jan;75(1):46-61. doi: 10.1111/j.1469-1809.2010.00587.x.
The standard in genetic association studies of complex diseases is replication and validation of positive results, with an emphasis on assessing the predictive value of associations. In response to this need, a number of analytical approaches have been developed to identify predictive models that account for complex genetic etiologies. Multifactor Dimensionality Reduction (MDR) is a commonly used, highly successful method designed to evaluate potential gene-gene interactions. MDR relies on classification error in a cross-validation framework to rank and evaluate potentially predictive models. Previous work has demonstrated the high power of MDR, but has not considered the accuracy and variance of the MDR prediction error estimate. Currently, we evaluate the bias and variance of the MDR error estimate as both a retrospective and prospective estimator and show that MDR can both underestimate and overestimate error. We argue that a prospective error estimate is necessary if MDR models are used for prediction, and propose a bootstrap resampling estimate, integrating population prevalence, to accurately estimate prospective error. We demonstrate that this bootstrap estimate is preferable for prediction to the error estimate currently produced by MDR. While demonstrated with MDR, the proposed estimation is applicable to all data-mining methods that use similar estimates.
复杂疾病基因关联研究的标准是对阳性结果进行重复验证,并着重评估关联的预测价值。为满足这一需求,已开发出多种分析方法来识别能解释复杂遗传病因的预测模型。多因素降维法(MDR)是一种常用且非常成功的方法,旨在评估潜在的基因-基因相互作用。MDR在交叉验证框架中依靠分类错误来对潜在的预测模型进行排序和评估。先前的研究已证明MDR具有强大的功效,但尚未考虑MDR预测误差估计的准确性和方差。目前,我们将MDR误差估计作为回顾性和前瞻性估计器来评估其偏差和方差,并表明MDR既能低估也能高估误差。我们认为,如果将MDR模型用于预测,前瞻性误差估计是必要的,并提出一种整合人群患病率的自助重采样估计方法,以准确估计前瞻性误差。我们证明,对于预测而言,这种自助估计比MDR目前产生的误差估计更可取。虽然以MDR为例进行了说明,但所提出的估计方法适用于所有使用类似估计的数据挖掘方法。