Lu Conghai, Wang Juan, Liu Jinxing, Zheng Chunhou, Kong Xiangzhen, Zhang Xiaofeng
School of Information Science and Engineering, Qufu Normal University, Rizhao, China.
College of Electrical Engineering and Automation, Anhui University, Hefei, China.
Front Genet. 2020 Jan 22;10:1353. doi: 10.3389/fgene.2019.01353. eCollection 2019.
As an important approach to cancer classification, cancer sample clustering is of particular importance for cancer research. For high dimensional gene expression data, examining approaches to selecting characteristic genes with high identification for cancer sample clustering is an important research area in the bioinformatics field. In this paper, we propose a novel integrated framework for cancer clustering known as the non-negative symmetric low-rank representation with graph regularization based on score function (NSLRG-S). First, a lowest rank matrix is obtained after NSLRG decomposition. The lowest rank matrix preserves the local data manifold information and the global data structure information of the gene expression data. Second, we construct the Score function based on the lowest rank matrix to weight all of the features of the gene expression data and calculate the score of each feature. Third, we rank the features according to their scores and select the feature genes for cancer sample clustering. Finally, based on selected feature genes, we use the K-means method to cluster the cancer samples. The experiments are conducted on The Cancer Genome Atlas (TCGA) data. Comparative experiments demonstrate that the NSLRG-S framework can significantly improve the clustering performance.
作为癌症分类的一种重要方法,癌症样本聚类在癌症研究中具有特别重要的意义。对于高维基因表达数据,研究用于癌症样本聚类的具有高识别性的特征基因选择方法是生物信息学领域的一个重要研究方向。在本文中,我们提出了一种用于癌症聚类的新型集成框架,称为基于得分函数的带图正则化的非负对称低秩表示(NSLRG-S)。首先,在NSLRG分解后获得一个最低秩矩阵。该最低秩矩阵保留了基因表达数据的局部数据流形信息和全局数据结构信息。其次,我们基于最低秩矩阵构建得分函数,对基因表达数据的所有特征进行加权,并计算每个特征的得分。第三,我们根据得分对特征进行排序,并选择用于癌症样本聚类的特征基因。最后,基于选定的特征基因,我们使用K均值方法对癌症样本进行聚类。实验是在癌症基因组图谱(TCGA)数据上进行的。对比实验表明,NSLRG-S框架可以显著提高聚类性能。