使用非重复变量的多个分类器的一致性分析：在微阵列基因表达数据中的诊断应用

Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.

作者信息

Su Zhenqiang, Hong Huixiao, Perkins Roger, Shao Xueguang, Cai Wensheng, Tong Weida

机构信息

Department of Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.

出版信息

Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.

DOI:10.1016/j.compbiolchem.2007.01.001

PMID:17303535

Abstract

Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.

摘要

基于DNA微阵列数据的类别预测已成为生物信息学在诊断/预后方面最重要的应用之一。需要强大的分类器来使用数据中嵌入的最具生物学相关性的基因。与单一分类器相比，结合多个分类器的共识方法具有缓解这一困难的属性。提出了一种名为使用非重复变量的多个分类器的共识分析（CAMCUN）的新分类方法，用于分析高维基因表达数据。CAMCUN方法结合了多个分类器，每个分类器都由为类别区分有效性而选择的不同、不重复的基因构建而成。因此，CAMCUN在最终分类器中利用了最具生物学相关性的基因。对于前列腺癌和白血病的两个著名数据集，CAMCUN算法被证明能持续给出更准确的预测。重要的是，CAMCUN算法采用了集成的10倍交叉验证和随机化测试来评估对未知样本预测的置信度。

相似文献

Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.

Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.

Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates.

J Theor Biol. 2009 Aug 7;259(3):533-40. doi: 10.1016/j.jtbi.2009.04.013. Epub 2009 May 3.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers.

Biosystems. 2007 Jul-Aug;90(1):78-86. doi: 10.1016/j.biosystems.2006.07.002. Epub 2006 Jul 10.

A new classification model with simple decision rule for discovering optimal feature gene pairs.

Comput Biol Med. 2007 Nov;37(11):1637-46. doi: 10.1016/j.compbiomed.2007.03.004. Epub 2007 May 7.

Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data.

Bioinformatics. 2005 Oct 15;21(20):3905-11. doi: 10.1093/bioinformatics/bti647. Epub 2005 Aug 30.

Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-9-S2-S7.

Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.

J Biomed Inform. 2004 Aug;37(4):269-84. doi: 10.1016/j.jbi.2004.07.007.

PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer.

Bioinformatics. 2006 Sep 15;22(18):2269-75. doi: 10.1093/bioinformatics/btl174. Epub 2006 May 8.

Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis.

Biosystems. 2006 Sep;85(3):165-76. doi: 10.1016/j.biosystems.2006.01.002. Epub 2006 Feb 21.

引用本文的文献

A spatially localized DNA linear classifier for cancer diagnosis.

Nat Commun. 2024 May 29;15(1):4583. doi: 10.1038/s41467-024-48869-y.

sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides.

Sci Rep. 2016 Aug 25;6:32115. doi: 10.1038/srep32115.

An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era.

Genome Biol. 2014 Dec 3;15(12):523. doi: 10.1186/s13059-014-0523-y.

Selecting a single model or combining multiple models for microarray-based classifier development?--a comparative analysis based on large and diverse datasets generated from the MAQC-II project.

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2105-12-S10-S3.

geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research.

BMC Bioinformatics. 2009 Jun 18;10:187. doi: 10.1186/1471-2105-10-187.

An integrated method for cancer classification and rule extraction from microarray data.

J Biomed Sci. 2009 Feb 24;16(1):25. doi: 10.1186/1423-0127-16-25.

Very Important Pool (VIP) genes--an application for microarray-based molecular signatures.

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S9. doi: 10.1186/1471-2105-9-S9-S9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用非重复变量的多个分类器的一致性分析：在微阵列基因表达数据中的诊断应用

Consensus analysis of multiple classifiers using non-repetitive variables: diagnostic application to microarray gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献