College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China.
BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):436. doi: 10.1186/s12859-021-04333-y.
Clustering and feature selection act major roles in many communities. As a matrix factorization, Low-Rank Representation (LRR) has attracted lots of attentions in clustering and feature selection, but sometimes its performance is frustrated when the data samples are insufficient or contain a lot of noise.
To address this drawback, a novel LRR model named TGLRR is proposed by integrating the truncated nuclear norm with graph-Laplacian. Different from the nuclear norm minimizing all singular values, the truncated nuclear norm only minimizes some smallest singular values, which can dispel the harm of shrinkage of the leading singular values. Finally, an efficient algorithm based on Linearized Alternating Direction with Adaptive Penalty is applied to resolving the optimization problem.
The results show that the TGLRR method exceeds the existing state-of-the-art methods in aspect of tumor clustering and gene selection on integrated gene expression data.
聚类和特征选择在许多领域都起着重要作用。作为一种矩阵分解方法,低秩表示(LRR)在聚类和特征选择中吸引了大量的关注,但当数据样本不足或包含大量噪声时,其性能有时会受到影响。
为了解决这一缺点,通过将截断核范数与图拉普拉斯算子相结合,提出了一种新的 LRR 模型,称为 TGLRR。与最小化所有奇异值的核范数不同,截断核范数仅最小化一些最小奇异值,从而可以消除主要奇异值收缩的危害。最后,应用一种基于自适应惩罚的线性交替方向的有效算法来解决优化问题。
结果表明,在整合基因表达数据的肿瘤聚类和基因选择方面,TGLRR 方法优于现有的最先进方法。