确保高维分类任务中计算可重复性的纲要。

A compendium to ensure computational reproducibility in high-dimensional classification tasks.

作者信息

Ruschhaupt Markus, Huber Wolfgang, Poustka Annemarie, Mansmann Ulrich

机构信息

Division of Molecular Genome Analysis, German Cancer Research Centre.

出版信息

Stat Appl Genet Mol Biol. 2004;3:Article37. doi: 10.2202/1544-6115.1078. Epub 2004 Dec 19.

DOI:10.2202/1544-6115.1078

PMID:16646817

Abstract

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

摘要

我们展示了一种用于对来自微阵列基因表达谱的高维数据进行分类的纲要的概念及实现。纲要是一种交互式文档，它将原始数据、统计处理方法、图表以及派生数据与文本记录和结论捆绑在一起。交互性使读者能够修改和扩展这些组件。我们探讨以下问题：分类器的判别能力在多大程度上取决于用于识别它的算法的选择；哪些替代分类器同样适用；结果的稳健性如何。这些问题的答案是分类器验证和生物学解释的重要前提。我们通过针对黄等人（2003年）首次研究的特定乳腺癌微阵列数据集审视这些问题，展示了如何使用这种方法。

相似文献

A compendium to ensure computational reproducibility in high-dimensional classification tasks.

Stat Appl Genet Mol Biol. 2004;3:Article37. doi: 10.2202/1544-6115.1078. Epub 2004 Dec 19.

Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis.

Genome Inform. 2008;21:200-11.

Development of biomarker classifiers from high-dimensional data.

Brief Bioinform. 2009 Sep;10(5):537-46. doi: 10.1093/bib/bbp016. Epub 2009 Apr 3.

Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.

Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.

Reliable gene signatures for microarray classification: assessment of stability and performance.

Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.

Appropriateness of some resampling-based inference procedures for assessing performance of prognostic classifiers derived from microarray data.

Stat Med. 2007 Feb 28;26(5):1102-13. doi: 10.1002/sim.2598.

Children's questions: a mechanism for cognitive development.

Monogr Soc Res Child Dev. 2007;72(1):vii-ix, 1-112; discussion 113-26. doi: 10.1111/j.1540-5834.2007.00412.x.

How large a training set is needed to develop a classifier for microarray data?

Clin Cancer Res. 2008 Jan 1;14(1):108-14. doi: 10.1158/1078-0432.CCR-07-0443.

New gene selection method for multiclass tumor classification by class centroid.

J Biomed Inform. 2009 Feb;42(1):59-65. doi: 10.1016/j.jbi.2008.05.011. Epub 2008 Jun 17.

引用本文的文献

Noninvasive machine-learning models for the detection of lesion-specific ischemia in patients with stable angina with intermediate stenosis severity on coronary CT angiography.

Phys Eng Sci Med. 2025 Mar;48(1):167-180. doi: 10.1007/s13246-024-01503-z. Epub 2024 Dec 30.

EEG-based Signatures of Schizophrenia, Depression, and Aberrant Aging: A Supervised Machine Learning Investigation.

Schizophr Bull. 2025 May 8;51(3):804-817. doi: 10.1093/schbul/sbae150.

Long-term Major Adverse Cardiac Event Prediction by Computed Tomography-derived Plaque Measures and Clinical Parameters Using Machine Learning.

Intern Med. 2025 Apr 1;64(7):1001-1008. doi: 10.2169/internalmedicine.3566-24. Epub 2024 Sep 4.

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges.

BMC Med. 2023 May 15;21(1):182. doi: 10.1186/s12916-023-02858-y.

Patterns of risk-Using machine learning and structural neuroimaging to identify pedophilic offenders.

Front Psychiatry. 2023 Apr 20;14:1001085. doi: 10.3389/fpsyt.2023.1001085. eCollection 2023.

Schizophrenia (Heidelb). 2023 Feb 17;9(1):11. doi: 10.1038/s41537-023-00337-0.

Machine learning-based ability to classify psychosis and early stages of disease through parenting and attachment-related variables is associated with social cognition.

BMC Psychol. 2021 Mar 23;9(1):47. doi: 10.1186/s40359-021-00552-3.

An Ensemble of Psychological and Physical Health Indices Discriminates Between Individuals with Chronic Pain and Healthy Controls with High Reliability: A Machine Learning Study.

Pain Ther. 2020 Dec;9(2):601-614. doi: 10.1007/s40122-020-00191-3. Epub 2020 Sep 3.

From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer.

mSystems. 2016 Dec 27;1(6). doi: 10.1128/mSystems.00101-16. eCollection 2016 Nov-Dec.

Fracture risk predictions based on statistical shape and density modeling of the proximal femur.

J Bone Miner Res. 2014 Sep;29(9):2090-100. doi: 10.1002/jbmr.2241.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

确保高维分类任务中计算可重复性的纲要。

A compendium to ensure computational reproducibility in high-dimensional classification tasks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献