van Vliet Martin H, Reyal Fabien, Horlings Hugo M, van de Vijver Marc J, Reinders Marcel J T, Wessels Lodewyk F A
Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.
BMC Genomics. 2008 Aug 6;9:375. doi: 10.1186/1471-2164-9-375.
Michiels et al. (Lancet 2005; 365: 488-92) employed a resampling strategy to show that the genes identified as predictors of prognosis from resamplings of a single gene expression dataset are highly variable. The genes most frequently identified in the separate resamplings were put forward as a 'gold standard'. On a higher level, breast cancer datasets collected by different institutions can be considered as resamplings from the underlying breast cancer population. The limited overlap between published prognostic signatures confirms the trend of signature instability identified by the resampling strategy. Six breast cancer datasets, totaling 947 samples, all measured on the Affymetrix platform, are currently available. This provides a unique opportunity to employ a substantial dataset to investigate the effects of pooling datasets on classifier accuracy, signature stability and enrichment of functional categories.
We show that the resampling strategy produces a suboptimal ranking of genes, which can not be considered to be a 'gold standard'. When pooling breast cancer datasets, we observed a synergetic effect on the classification performance in 73% of the cases. We also observe a significant positive correlation between the number of datasets that is pooled, the validation performance, the number of genes selected, and the enrichment of specific functional categories. In addition, we have evaluated the support for five explanations that have been postulated for the limited overlap of signatures.
The limited overlap of current signature genes can be attributed to small sample size. Pooling datasets results in more accurate classification and a convergence of signature genes. We therefore advocate the analysis of new data within the context of a compendium, rather than analysis in isolation.
米歇尔斯等人(《柳叶刀》,2005年;365卷:488 - 492页)采用重采样策略表明,从单个基因表达数据集中的重采样所鉴定出的作为预后预测指标的基因具有高度变异性。在各个重采样中最常鉴定出的基因被提出作为“金标准”。从更高层面来看,不同机构收集的乳腺癌数据集可被视为来自潜在乳腺癌总体的重采样。已发表的预后特征之间有限的重叠证实了重采样策略所识别出的特征不稳定性趋势。目前有六个乳腺癌数据集,共947个样本,均在Affymetrix平台上进行测量。这提供了一个独特的机会,可利用大量数据集来研究合并数据集对分类器准确性、特征稳定性及功能类别富集的影响。
我们表明,重采样策略产生的基因排名次优,不能被视为“金标准”。在合并乳腺癌数据集时,我们在73%的案例中观察到对分类性能有协同效应。我们还观察到合并的数据集数量、验证性能、所选基因数量以及特定功能类别的富集之间存在显著正相关。此外,我们评估了对为特征有限重叠所假定的五种解释的支持情况。
当前特征基因的有限重叠可归因于样本量小。合并数据集可实现更准确的分类以及特征基因的趋同。因此,我们提倡在综合数据集的背景下分析新数据,而非孤立地进行分析。