一些基于重采样的推断程序在评估源自微阵列数据的预后分类器性能方面的适用性。

Appropriateness of some resampling-based inference procedures for assessing performance of prognostic classifiers derived from microarray data.

作者信息

Lusa Lara, McShane Lisa M, Radmacher Michael D, Shih Joanna H, Wright George W, Simon Richard

机构信息

Department of Experimental Oncology, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milano, Italy.

出版信息

Stat Med. 2007 Feb 28;26(5):1102-13. doi: 10.1002/sim.2598.

DOI:10.1002/sim.2598

PMID:16755534

Abstract

The goal of many gene-expression microarray profiling clinical studies is to develop a multivariate classifier to predict patient disease outcome from a gene-expression profile measured on some biological specimen from the patient. Often some preliminary validation of the predictive power of a profile-based classifier is carried out using the same data set that was used to derive the classifier. Techniques such as cross-validation or bootstrapping can be used in this setting to assess predictive power, and if applied correctly, can result in a less biased estimate of predictive accuracy of a classifier. However, some investigators have attempted to apply standard statistical inference procedures to assess the statistical significance of associations between true and cross-validated predicted outcomes. We demonstrate in this paper that naïve application of standard statistical inference procedures to these measures of association under null situations can result in greatly inflated testing type I error rates. Under alternatives of small to moderate associations, confidence interval coverage probabilities may be too low, although for very large associations coverage probabilities approach their intended values. Our results suggest that caution should be exercised in interpreting some of the claims of exceptional prognostic classifier performance that have been reported in prominent biomedical journals in the past few years.

摘要

许多基因表达微阵列分析临床研究的目标是开发一种多变量分类器，以便根据在患者的某些生物样本上测量的基因表达谱来预测患者的疾病结局。通常，基于谱的分类器预测能力的一些初步验证是使用用于推导该分类器的相同数据集进行的。诸如交叉验证或自抽样法等技术可用于此设置中以评估预测能力，并且如果正确应用，可导致对分类器预测准确性的偏差较小的估计。然而，一些研究人员试图应用标准统计推断程序来评估真实和交叉验证的预测结果之间关联的统计显著性。我们在本文中证明，在零假设情况下对这些关联度量天真地应用标准统计推断程序可能会导致检验的I型错误率大幅膨胀。在小到中等关联的备择假设下，置信区间覆盖概率可能过低，尽管对于非常大的关联，覆盖概率接近其预期值。我们的结果表明，在解释过去几年在著名生物医学期刊上报道的一些关于卓越预后分类器性能的说法时应谨慎。