Department of Radiation Oncology (Maastro), GROW-School for Oncology and Developmental Biology, Maastricht University Medical Center, Maastricht, The Netherlands.
PLoS One. 2011;6(12):e28320. doi: 10.1371/journal.pone.0028320. Epub 2011 Dec 7.
Highly parallel analysis of gene expression has recently been used to identify gene sets or 'signatures' to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures.
A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number.
We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited.
最近,高通量分析基因表达已被用于鉴定基因集或“特征”,以改善患者诊断和风险分层。一旦生成了特征,就会使用传统的统计检验来评估其预后性能。但是,由于微阵列的维度,这可能导致对这些特征的错误解释。
开发了一种方法来测试用户指定数量的随机选择特征在患者微阵列数据集中的批次。通过在六个公共可用的癌症患者微阵列数据集上计算曲线下面积(AUC),使用 ROC 分析评估随机生成的特征产生预后价值的百分比。我们发现,在单个数据集中评估时,由随机选择的基因组成的特征平均有 10%的机会达到显著水平,但具体取决于所研究的数据集,范围可以从 1%到约 40%。增加验证数据集的数量会显著减少这个数字。
我们已经表明,使用任意截止值来评估特征的显著性不适用于这种类型的研究,而应该针对每个数据集分别定义。我们的方法可以用于通过将其性能与数千个随机生成的特征进行比较,在数据集中建立和评估任何衍生基因特征的性能。对于数据较少且在多个数据集中进行测试受到限制的情况,它将最感兴趣。