Suppr超能文献

用于鲁棒肿瘤样本聚类和基因网络模块发现的联合Lp范数和L范数约束图拉普拉斯主成分分析

Joint Lp-Norm and L-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery.

作者信息

Kong Xiang-Zhen, Song Yu, Liu Jin-Xing, Zheng Chun-Hou, Yuan Sha-Sha, Wang Juan, Dai Ling-Yun

机构信息

School of Computer Science, Qufu Normal University, Rizhao, China.

出版信息

Front Genet. 2021 Feb 23;12:621317. doi: 10.3389/fgene.2021.621317. eCollection 2021.

Abstract

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L,-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.

摘要

伴随不同范数约束的降维方法在从大规模基因表达数据中挖掘有用信息方面发挥着重要作用。本文提出了一种基于传统主成分分析(PCA)的名为Lp范数和L范数约束图拉普拉斯主成分分析(PL21GPCA)的新方法,用于稳健的肿瘤样本聚类和基因网络模块发现。PL21GPCA方法突出了三个方面。首先,为了降低对异常值和噪声的高敏感性,在损失函数上应用非凸近端Lp范数(0 << 1)约束。其次,为了增强癌症样本中基因表达的稀疏性,在其中一个正则化项上使用L范数约束。第三,为了保留数据的几何结构,我们将图拉普拉斯正则化项引入到PL21GPCA优化模型中。在五个基因表达数据集上进行了广泛的实验,包括一个基准数据集、来自癌症基因组图谱(TCGA)的两个单癌数据集以及来自TCGA的两个多癌综合数据集,以验证我们方法的有效性。实验结果表明,PL21GPCA方法在肿瘤样本聚类方面比许多其他方法表现更好。此外,该方法用于发现基因网络模块,目的是找到可能与某些癌症相关的关键基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156b/7940841/ec1b8e27be4e/fgene-12-621317-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验