Shen Lizhen, Jiang Hua, He Mingfang, Liu Guoqing
School of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing, 211800, China.
School of Physical and Mathematical Sciences, Nanjing Tech University, Nanjing, 211800, China.
PLoS One. 2017 Dec 13;12(12):e0189533. doi: 10.1371/journal.pone.0189533. eCollection 2017.
Microarray technology is important to simultaneously express multiple genes over a number of time points. Multiple classifier models, such as sparse representation (SR)-based method, have been developed to classify microarray gene expression data. These methods allocate the gene data points to different clusters. In this paper, we propose a novel collaborative representation (CR)-based classification with regularized least square to classify gene data. First, the CR codes a testing sample as a sparse linear combination of all training samples and then classifies the testing sample by evaluating which class leads to the minimum representation error. This CR-based classification approach is remarkably less complex than traditional classification methods but leads to very competitive classification results. In addition, compressive sensing approach is adopted to project the high-dimensional gene expression dataset to a lower-dimensional space which nearly contains the whole information. This compression without loss is beneficial to reduce the computational load. Experiments to detect subtypes of diseases, such as leukemia and autism spectrum disorders, are performed by analyzing the gene expression. The results show that the proposed CR-based algorithm exhibits significantly higher stability and accuracy than the traditional classifiers, such as support vector machine algorithm.
微阵列技术对于在多个时间点同时表达多个基因非常重要。已经开发了多种分类器模型,例如基于稀疏表示(SR)的方法,用于对微阵列基因表达数据进行分类。这些方法将基因数据点分配到不同的簇中。在本文中,我们提出了一种基于协作表示(CR)的正则化最小二乘分类方法来对基因数据进行分类。首先,CR将测试样本编码为所有训练样本的稀疏线性组合,然后通过评估哪个类别导致最小的表示误差来对测试样本进行分类。这种基于CR的分类方法比传统分类方法的复杂度显著降低,但却能产生极具竞争力的分类结果。此外,采用压缩感知方法将高维基因表达数据集投影到一个几乎包含全部信息的低维空间。这种无损压缩有利于降低计算量。通过分析基因表达来进行检测疾病亚型(如白血病和自闭症谱系障碍)的实验。结果表明,所提出的基于CR的算法比传统分类器(如支持向量机算法)具有显著更高的稳定性和准确性。