Ruschhaupt Markus, Huber Wolfgang, Poustka Annemarie, Mansmann Ulrich
Division of Molecular Genome Analysis, German Cancer Research Centre.
Stat Appl Genet Mol Biol. 2004;3:Article37. doi: 10.2202/1544-6115.1078. Epub 2004 Dec 19.
We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).
我们展示了一种用于对来自微阵列基因表达谱的高维数据进行分类的纲要的概念及实现。纲要是一种交互式文档,它将原始数据、统计处理方法、图表以及派生数据与文本记录和结论捆绑在一起。交互性使读者能够修改和扩展这些组件。我们探讨以下问题:分类器的判别能力在多大程度上取决于用于识别它的算法的选择;哪些替代分类器同样适用;结果的稳健性如何。这些问题的答案是分类器验证和生物学解释的重要前提。我们通过针对黄等人(2003年)首次研究的特定乳腺癌微阵列数据集审视这些问题,展示了如何使用这种方法。