CMA：一个用于高维数据监督分类的综合Bioconductor软件包。

CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data.

作者信息

Slawski M, Daumer M, Boulesteix A-L

机构信息

Sylvia Lawry Centre for Multiple Sclerosis Research, Munich, Germany.

出版信息

BMC Bioinformatics. 2008 Oct 16;9:439. doi: 10.1186/1471-2105-9-439.

DOI:10.1186/1471-2105-9-439

PMID:18925941

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2646186/

Abstract

BACKGROUND

For the last eight years, microarray-based classification has been a major topic in statistics, bioinformatics and biomedicine research. Traditional methods often yield unsatisfactory results or may even be inapplicable in the so-called "p >> n" setting where the number of predictors p by far exceeds the number of observations n, hence the term "ill-posed-problem". Careful model selection and evaluation satisfying accepted good-practice standards is a very complex task for statisticians without experience in this area or for scientists with limited statistical background. The multiplicity of available methods for class prediction based on high-dimensional data is an additional practical challenge for inexperienced researchers.

RESULTS

In this article, we introduce a new Bioconductor package called CMA (standing for "Classification for MicroArrays") for automatically performing variable selection, parameter tuning, classifier construction, and unbiased evaluation of the constructed classifiers using a large number of usual methods. Without much time and effort, users are provided with an overview of the unbiased accuracy of most top-performing classifiers. Furthermore, the standardized evaluation framework underlying CMA can also be beneficial in statistical research for comparison purposes, for instance if a new classifier has to be compared to existing approaches.

CONCLUSION

CMA is a user-friendly comprehensive package for classifier construction and evaluation implementing most usual approaches. It is freely available from the Bioconductor website at (http://bioconductor.org/packages/2.3/bioc/html/CMA.html).

摘要

背景

在过去八年中，基于微阵列的分类一直是统计学、生物信息学和生物医学研究中的一个主要课题。传统方法往往产生不尽人意的结果，甚至在所谓的“p >> n”情况下可能不适用，即预测变量的数量p远远超过观测值的数量n，因此有“不适定问题”这一术语。对于没有该领域经验的统计学家或统计背景有限的科学家而言，按照公认的良好实践标准进行仔细的模型选择和评估是一项非常复杂的任务。基于高维数据的类预测可用方法众多，这对缺乏经验的研究人员来说是另一个实际挑战。

结果

在本文中，我们介绍了一个名为CMA（代表“微阵列分类”）的新Bioconductor软件包，它可以使用大量常用方法自动执行变量选择、参数调整、分类器构建以及对构建的分类器进行无偏评估。无需花费太多时间和精力，就能为用户提供大多数表现最佳的分类器的无偏准确性概述。此外，CMA所基于的标准化评估框架在统计研究中用于比较目的时也可能很有用，例如，如果要将新的分类器与现有方法进行比较。

结论

CMA是一个用户友好的综合软件包，用于分类器构建和评估，实现了大多数常用方法。它可从Bioconductor网站（http://bioconductor.org/packages/2.3/bioc/html/CMA.html）免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fabf/2646186/a6b2589869b9/1471-2105-9-439-1.jpg

相似文献

CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data.CMA：一个用于高维数据监督分类的综合Bioconductor软件包。

BMC Bioinformatics. 2008 Oct 16;9:439. doi: 10.1186/1471-2105-9-439.

mAPKL: R/ Bioconductor package for detecting gene exemplars and revealing their characteristics.mAPKL：用于检测基因范例并揭示其特征的R/Bioconductor软件包。

BMC Bioinformatics. 2015 Sep 15;16(1):291. doi: 10.1186/s12859-015-0719-5.

WebArray: an online platform for microarray data analysis.WebArray：一个用于微阵列数据分析的在线平台。

BMC Bioinformatics. 2005 Dec 21;6:306. doi: 10.1186/1471-2105-6-306.

Regularized Least Squares Cancer classifiers from DNA microarray data.基于DNA微阵列数据的正则化最小二乘癌症分类器。

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-6-S4-S2.

arrayQualityMetrics--a bioconductor package for quality assessment of microarray data.arrayQualityMetrics - 一个用于微阵列数据质量评估的生物导体软件包。

Bioinformatics. 2009 Feb 1;25(3):415-6. doi: 10.1093/bioinformatics/btn647. Epub 2008 Dec 23.

Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis.Simpleaffy：一个用于Affymetrix质量控制和数据分析的生物导体软件包。

Bioinformatics. 2005 Sep 15;21(18):3683-5. doi: 10.1093/bioinformatics/bti605. Epub 2005 Aug 2.

WilcoxCV: an R package for fast variable selection in cross-validation.WilcoxCV：一个用于在交叉验证中快速进行变量选择的R包。

Bioinformatics. 2007 Jul 1;23(13):1702-4. doi: 10.1093/bioinformatics/btm162. Epub 2007 May 11.

A comprehensive simulation study on classification of RNA-Seq data.一项关于RNA测序数据分类的综合模拟研究。

PLoS One. 2017 Aug 23;12(8):e0182507. doi: 10.1371/journal.pone.0182507. eCollection 2017.

Microarray Я US: a user-friendly graphical interface to Bioconductor tools that enables accurate microarray data analysis and expedites comprehensive functional analysis of microarray results.微阵列分析软件包（Microarray Я US）：一个面向Bioconductor工具的用户友好型图形界面，可实现准确的微阵列数据分析并加速对微阵列结果的全面功能分析。

BMC Res Notes. 2012 Jun 8;5:282. doi: 10.1186/1756-0500-5-282.

Seq2pathway: an R/Bioconductor package for pathway analysis of next-generation sequencing data.Seq2pathway：一个用于下一代测序数据通路分析的R/Bioconductor软件包。

Bioinformatics. 2015 Sep 15;31(18):3043-5. doi: 10.1093/bioinformatics/btv289. Epub 2015 May 15.

引用本文的文献

SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering.SEMdag：通过节点或层排序快速学习有向无环图。

PLoS One. 2025 Jan 8;20(1):e0317283. doi: 10.1371/journal.pone.0317283. eCollection 2025.

A novel cross-species differential tumor classification method based on exosome-derived microRNA biomarkers established by human-dog lymphoid and mammary tumor cell lines' transcription profiles.一种基于人犬淋巴和乳腺肿瘤细胞系转录谱建立的外泌体来源微小RNA生物标志物的新型跨物种肿瘤鉴别分类方法。

Vet World. 2022 May;15(5):1163-1170. doi: 10.14202/vetworld.2022.1163-1170. Epub 2022 May 11.

TL1A-DR3 Plasma Levels Are Predictive of HIV-1 Disease Control, and DR3 Costimulation Boosts HIV-1-Specific T Cell Responses.TL1A-DR3 血浆水平可预测 HIV-1 疾病控制，DR3 共刺激可增强 HIV-1 特异性 T 细胞应答。

J Immunol. 2020 Dec 15;205(12):3348-3357. doi: 10.4049/jimmunol.2000933. Epub 2020 Nov 11.

Methylation regulation of Antiviral host factors, Interferon Stimulated Genes (ISGs) and T-cell responses associated with natural HIV control.抗病毒宿主因子、干扰素刺激基因 (ISGs) 和与自然 HIV 控制相关的 T 细胞反应的甲基化调控。

PLoS Pathog. 2020 Aug 6;16(8):e1008678. doi: 10.1371/journal.ppat.1008678. eCollection 2020 Aug.

Biomarkers Associated with Atrial Fibrillation in Patients with Ischemic Stroke: A Pilot Study from the NOR-FIB Study.缺血性卒中患者中与心房颤动相关的生物标志物：来自NOR-FIB研究的一项初步研究

Cerebrovasc Dis Extra. 2020;10(1):11-20. doi: 10.1159/000504529. Epub 2020 Feb 6.

Ventricular-Subventricular Zone Contact by Glioblastoma is Not Associated with Molecular Signatures in Bulk Tumor Data.胶质母细胞瘤的室下区接触与肿瘤数据的分子特征无关。

Sci Rep. 2019 Feb 12;9(1):1842. doi: 10.1038/s41598-018-37734-w.

The impact of the method of extracting metabolic signal from 1H-NMR data on the classification of samples: A case study of binning and BATMAN in lung cancer.从 1H-NMR 数据中提取代谢信号的方法对样品分类的影响：以肺癌为例的 binning 和 BATMAN 研究。

PLoS One. 2019 Feb 6;14(2):e0211854. doi: 10.1371/journal.pone.0211854. eCollection 2019.

The Carnitine Shuttle Pathway is Altered in Patients With Neovascular Age-Related Macular Degeneration.肉碱穿梭途径在新生血管性年龄相关性黄斑变性患者中发生改变。

Invest Ophthalmol Vis Sci. 2018 Oct 1;59(12):4978-4985. doi: 10.1167/iovs.18-25137.

Computational systems biology approaches for Parkinson's disease.计算系统生物学方法在帕金森病中的应用。

Cell Tissue Res. 2018 Jul;373(1):91-109. doi: 10.1007/s00441-017-2734-5. Epub 2017 Nov 29.

Whole blood microRNA expression may not be useful for screening non-small cell lung cancer.全血微小RNA表达可能对非小细胞肺癌的筛查无用。

PLoS One. 2017 Jul 25;12(7):e0181926. doi: 10.1371/journal.pone.0181926. eCollection 2017.

本文引用的文献

Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value.基于微阵列的分类及临床预测指标：关于联合分类器及附加预测价值

Bioinformatics. 2008 Aug 1;24(15):1698-706. doi: 10.1093/bioinformatics/btn262. Epub 2008 Jun 9.

Reducing the probability of false positive research findings by pre-publication validation - experience with a large multiple sclerosis database.通过发表前验证降低假阳性研究结果的概率——来自一个大型多发性硬化症数据库的经验

BMC Med Res Methodol. 2008 Apr 10;8:18. doi: 10.1186/1471-2288-8-18.

Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification.概率神经网络与多项式Adaline作为分类的互补技术。

IEEE Trans Neural Netw. 1990;1(1):111-21. doi: 10.1109/72.80210.

SignS: a parallelized, open-source, freely available, web-based tool for gene selection and molecular signatures for survival and censored data.SignS：一种用于基因选择以及生成生存和删失数据分子特征的并行化、开源、免费的基于网络的工具。

BMC Bioinformatics. 2008 Jan 21;9:30. doi: 10.1186/1471-2105-9-30.

Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models.在稀疏高维生存模型的提升估计中考虑强制协变量。

BMC Bioinformatics. 2008 Jan 10;9:14. doi: 10.1186/1471-2105-9-14.

Predicting survival from microarray data--a comparative study.从微阵列数据预测生存率——一项比较研究。

Bioinformatics. 2007 Aug 15;23(16):2080-7. doi: 10.1093/bioinformatics/btm305. Epub 2007 Jun 6.

WilcoxCV: an R package for fast variable selection in cross-validation.WilcoxCV：一个用于在交叉验证中快速进行变量选择的R包。

Bioinformatics. 2007 Jul 1;23(13):1702-4. doi: 10.1093/bioinformatics/btm162. Epub 2007 May 11.

Assessment of survival prediction models based on microarray data.基于微阵列数据的生存预测模型评估。

Bioinformatics. 2007 Jul 15;23(14):1768-74. doi: 10.1093/bioinformatics/btm232. Epub 2007 May 7.

Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting.已发表的癌症预后微阵列研究的批判性综述以及统计分析与报告指南。

J Natl Cancer Inst. 2007 Jan 17;99(2):147-57. doi: 10.1093/jnci/djk018.

Reliable gene signatures for microarray classification: assessment of stability and performance.用于微阵列分类的可靠基因特征：稳定性和性能评估

Bioinformatics. 2006 Oct 1;22(19):2356-63. doi: 10.1093/bioinformatics/btl400. Epub 2006 Jul 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CMA：一个用于高维数据监督分类的综合Bioconductor软件包。

CMA: a comprehensive Bioconductor package for supervised classification with high dimensional data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献