Molecular Diagnostics Department, Eindhoven, the Netherlands.
BMC Bioinformatics. 2009 Nov 26;10:389. doi: 10.1186/1471-2105-10-389.
Large discrepancies in signature composition and outcome concordance have been observed between different microarray breast cancer expression profiling studies. This is often ascribed to differences in array platform as well as biological variability. We conjecture that other reasons for the observed discrepancies are the measurement error associated with each feature and the choice of preprocessing method. Microarray data are known to be subject to technical variation and the confidence intervals around individual point estimates of expression levels can be wide. Furthermore, the estimated expression values also vary depending on the selected preprocessing scheme. In microarray breast cancer classification studies, however, these two forms of feature variability are almost always ignored and hence their exact role is unclear.
We have performed a comprehensive sensitivity analysis of microarray breast cancer classification under the two types of feature variability mentioned above. We used data from six state of the art preprocessing methods, using a compendium consisting of eight different datasets, involving 1131 hybridizations, containing data from both one and two-color array technology. For a wide range of classifiers, we performed a joint study on performance, concordance and stability. In the stability analysis we explicitly tested classifiers for their noise tolerance by using perturbed expression profiles that are based on uncertainty information directly related to the preprocessing methods. Our results indicate that signature composition is strongly influenced by feature variability, even if the array platform and the stratification of patient samples are identical. In addition, we show that there is often a high level of discordance between individual class assignments for signatures constructed on data coming from different preprocessing schemes, even if the actual signature composition is identical.
Feature variability can have a strong impact on breast cancer signature composition, as well as the classification of individual patient samples. We therefore strongly recommend that feature variability is considered in analyzing data from microarray breast cancer expression profiling experiments.
不同的微阵列乳腺癌表达谱研究之间存在签名组成和结果一致性的巨大差异。这通常归因于阵列平台以及生物变异性的差异。我们推测观察到的差异的其他原因是与每个特征相关的测量误差以及预处理方法的选择。微阵列数据已知受到技术变化的影响,表达水平的个体点估计的置信区间可能很宽。此外,估计的表达值还取决于所选预处理方案。然而,在微阵列乳腺癌分类研究中,这两种形式的特征可变性几乎总是被忽略,因此其确切作用尚不清楚。
我们对上述两种类型的特征可变性进行了微阵列乳腺癌分类的全面敏感性分析。我们使用了来自六种最先进的预处理方法的数据,使用了由八个不同数据集组成的汇编,涉及 1131 次杂交,包含来自双色和单色彩集技术的数据。对于广泛的分类器,我们对性能、一致性和稳定性进行了联合研究。在稳定性分析中,我们通过使用基于与预处理方法直接相关的不确定性信息的扰动表达谱来明确测试分类器的噪声容忍度。我们的结果表明,即使阵列平台和患者样本分层相同,签名组成也受到特征可变性的强烈影响。此外,我们表明,即使实际签名组成相同,来自不同预处理方案的数据构建的签名的个体分类分配之间通常存在高度不一致。
特征可变性会对乳腺癌签名组成以及个体患者样本的分类产生重大影响。因此,我们强烈建议在分析微阵列乳腺癌表达谱实验数据时考虑特征可变性。