Radmacher Michael D, McShane Lisa M, Simon Richard
Biometric Research Branch, National Cancer Institute, 6130 Executive Boulevard, Bethesda, MD 20892-7434, USA.
J Comput Biol. 2002;9(3):505-11. doi: 10.1089/106652702760138592.
We propose a general framework for prediction of predefined tumor classes using gene expression profiles from microarray experiments. The framework consists of 1) evaluating the appropriateness of class prediction for the given data set, 2) selecting the prediction method, 3) performing cross-validated class prediction, and 4) assessing the significance of prediction results by permutation testing. We describe an application of the prediction paradigm to gene expression profiles from human breast cancers, with specimens classified as positive or negative for BRCA1 mutations and also for BRCA2 mutations. In both cases, the accuracy of class prediction was statistically significant when compared to the accuracy of prediction expected by chance. The framework proposed here for the application of class prediction is designed to reduce the occurrence of spurious findings, a legitimate concern for high-dimensional microarray data. The prediction paradigm will serve as a good framework for comparing different prediction methods and may accelerate the development of molecular classifiers that are clinically useful.
我们提出了一个通用框架,用于使用来自微阵列实验的基因表达谱预测预定义的肿瘤类别。该框架包括:1)评估给定数据集的类别预测适用性;2)选择预测方法;3)进行交叉验证的类别预测;4)通过置换检验评估预测结果的显著性。我们描述了该预测范式在人类乳腺癌基因表达谱中的应用,样本根据BRCA1突变以及BRCA2突变分为阳性或阴性。在这两种情况下,与随机预测的准确性相比,类别预测的准确性具有统计学显著性。这里提出的用于类别预测应用的框架旨在减少虚假发现的出现,这是高维微阵列数据的一个合理关注点。该预测范式将作为比较不同预测方法的良好框架,并可能加速具有临床实用性的分子分类器的开发。