College of Information and Communication Technology, Qufu Normal University, Rizhao, Shandong 276826, China.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1273-82. doi: 10.1109/TCBB.2011.20.
A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. In recent years, it has been shown that sparse representation (SR) by l1-norm minimization is robust to noise, outliers and even incomplete measurements, and SR has been successfully used for classification. This paper presents a new SR-based method for tumor classification using gene expression data. A set of metasamples are extracted from the training samples, and then an input testing sample is represented as the linear combination of these metasamples by l1-regularized least square method. Classification is achieved by using a discriminating function defined on the representation coefficients. Since l1-norm minimization leads to a sparse solution, the proposed method is called metasample-based SR classification (MSRC). Extensive experiments on publicly available gene expression data sets show that MSRC is efficient for tumor classification, achieving higher accuracy than many existing representative schemes.
准确可靠地识别肿瘤类型对于癌症的恰当治疗至关重要。近年来,已经证明通过 l1-范数最小化进行稀疏表示(SR)对于噪声、异常值甚至不完全测量具有鲁棒性,并且 SR 已成功用于分类。本文提出了一种使用基因表达数据进行肿瘤分类的新的基于 SR 的方法。从训练样本中提取一组元样本,然后通过 l1-正则化最小二乘法将输入测试样本表示为这些元样本的线性组合。通过定义在表示系数上的判别函数来实现分类。由于 l1-范数最小化导致稀疏解,因此所提出的方法称为基于元样本的 SR 分类(MSRC)。在公开的基因表达数据集上的广泛实验表明,MSRC 对于肿瘤分类是有效的,比许多现有代表性方案具有更高的准确性。