Su Zhenqiang, Hong Huixiao, Perkins Roger, Shao Xueguang, Cai Wensheng, Tong Weida
Department of Chemistry, University of Science and Technology of China, Hefei, Anhui 230026, China.
Comput Biol Chem. 2007 Feb;31(1):48-56. doi: 10.1016/j.compbiolchem.2007.01.001. Epub 2007 Jan 4.
Class prediction based on DNA microarray data has been emerged as one of the most important application of bioinformatics for diagnostics/prognostics. Robust classifiers are needed that use most biologically relevant genes embedded in the data. A consensus approach that combines multiple classifiers has attributes that mitigate this difficulty compared to a single classifier. A new classification method named as consensus analysis of multiple classifiers using non-repetitive variables (CAMCUN) was proposed for the analysis of hyper-dimensional gene expression data. The CAMCUN method combined multiple classifiers, each of which was built from distinct, non-repeated genes that were selected for effectiveness in class differentiation. Thus, the CAMCUN utilized most biologically relevant genes in the final classifier. The CAMCUN algorithm was demonstrated to give consistently more accurate predictions for two well-known datasets for prostate cancer and leukemia. Importantly, the CAMCUN algorithm employed an integrated 10-fold cross-validation and randomization test to assess the degree of confidence of the predictions for unknown samples.
基于DNA微阵列数据的类别预测已成为生物信息学在诊断/预后方面最重要的应用之一。需要强大的分类器来使用数据中嵌入的最具生物学相关性的基因。与单一分类器相比,结合多个分类器的共识方法具有缓解这一困难的属性。提出了一种名为使用非重复变量的多个分类器的共识分析(CAMCUN)的新分类方法,用于分析高维基因表达数据。CAMCUN方法结合了多个分类器,每个分类器都由为类别区分有效性而选择的不同、不重复的基因构建而成。因此,CAMCUN在最终分类器中利用了最具生物学相关性的基因。对于前列腺癌和白血病的两个著名数据集,CAMCUN算法被证明能持续给出更准确的预测。重要的是,CAMCUN算法采用了集成的10倍交叉验证和随机化测试来评估对未知样本预测的置信度。