用于鲁棒肿瘤样本聚类和基因网络模块发现的联合Lp范数和L范数约束图拉普拉斯主成分分析

Joint Lp-Norm and L-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery.

作者信息

Kong Xiang-Zhen, Song Yu, Liu Jin-Xing, Zheng Chun-Hou, Yuan Sha-Sha, Wang Juan, Dai Ling-Yun

机构信息

School of Computer Science, Qufu Normal University, Rizhao, China.

出版信息

Front Genet. 2021 Feb 23;12:621317. doi: 10.3389/fgene.2021.621317. eCollection 2021.

DOI:10.3389/fgene.2021.621317

PMID:33708239

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7940841/

Abstract

The dimensionality reduction method accompanied by different norm constraints plays an important role in mining useful information from large-scale gene expression data. In this article, a novel method named Lp-norm and L-norm constrained graph Laplacian principal component analysis (PL21GPCA) based on traditional principal component analysis (PCA) is proposed for robust tumor sample clustering and gene network module discovery. Three aspects are highlighted in the PL21GPCA method. First, to degrade the high sensitivity to outliers and noise, the non-convex proximal Lp-norm (0 < < 1)constraint is applied on the loss function. Second, to enhance the sparsity of gene expression in cancer samples, the L,-norm constraint is used on one of the regularization terms. Third, to retain the geometric structure of the data, we introduce the graph Laplacian regularization item to the PL21GPCA optimization model. Extensive experiments on five gene expression datasets, including one benchmark dataset, two single-cancer datasets from The Cancer Genome Atlas (TCGA), and two integrated datasets of multiple cancers from TCGA, are performed to validate the effectiveness of our method. The experimental results demonstrate that the PL21GPCA method performs better than many other methods in terms of tumor sample clustering. Additionally, this method is used to discover the gene network modules for the purpose of finding key genes that may be associated with some cancers.

摘要

伴随不同范数约束的降维方法在从大规模基因表达数据中挖掘有用信息方面发挥着重要作用。本文提出了一种基于传统主成分分析（PCA）的名为Lp范数和L范数约束图拉普拉斯主成分分析（PL21GPCA）的新方法，用于稳健的肿瘤样本聚类和基因网络模块发现。PL21GPCA方法突出了三个方面。首先，为了降低对异常值和噪声的高敏感性，在损失函数上应用非凸近端Lp范数（0 << 1）约束。其次，为了增强癌症样本中基因表达的稀疏性，在其中一个正则化项上使用L范数约束。第三，为了保留数据的几何结构，我们将图拉普拉斯正则化项引入到PL21GPCA优化模型中。在五个基因表达数据集上进行了广泛的实验，包括一个基准数据集、来自癌症基因组图谱（TCGA）的两个单癌数据集以及来自TCGA的两个多癌综合数据集，以验证我们方法的有效性。实验结果表明，PL21GPCA方法在肿瘤样本聚类方面比许多其他方法表现更好。此外，该方法用于发现基因网络模块，目的是找到可能与某些癌症相关的关键基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/156b/7940841/ec1b8e27be4e/fgene-12-621317-g001.jpg

相似文献

Joint Lp-Norm and L-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery.用于鲁棒肿瘤样本聚类和基因网络模块发现的联合Lp范数和L范数约束图拉普拉斯主成分分析

Front Genet. 2021 Feb 23;12:621317. doi: 10.3389/fgene.2021.621317. eCollection 2021.

Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data.基于图拉普拉斯算子和双稀疏约束的主成分分析用于多视图数据的特征选择和样本聚类

Hum Hered. 2019;84(1):47-58. doi: 10.1159/000501653. Epub 2019 Aug 29.

PCA Based on Graph Laplacian Regularization and P-Norm for Gene Selection and Clustering.基于图拉普拉斯正则化和P范数的主成分分析用于基因选择和聚类

IEEE Trans Nanobioscience. 2017 Jun;16(4):257-265. doi: 10.1109/TNB.2017.2690365. Epub 2017 Mar 31.

Joint L-norm and random walk graph constrained PCA for single-cell RNA-seq data.基于联合 L 范数和随机游走图约束的 PCA 方法在单细胞 RNA-seq 数据分析中的应用。

Comput Methods Biomech Biomed Engin. 2024 Jan-Mar;27(4):498-511. doi: 10.1080/10255842.2023.2188106. Epub 2023 Mar 13.

Sparse Graph Regularization Non-Negative Matrix Factorization Based on Huber Loss Model for Cancer Data Analysis.基于Huber损失模型的稀疏图正则化非负矩阵分解用于癌症数据分析

Front Genet. 2019 Nov 20;10:1054. doi: 10.3389/fgene.2019.01054. eCollection 2019.

PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data.基于联合图拉普拉斯和稀疏约束的主成分分析：在基因表达数据上进行差异表达基因的识别和样本聚类。

BMC Bioinformatics. 2019 Dec 30;20(Suppl 22):716. doi: 10.1186/s12859-019-3229-z.

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data.用于多视图基因表达数据中样本聚类和特征选择的鲁棒超图正则化非负矩阵分解。

Hum Genomics. 2019 Oct 22;13(Suppl 1):46. doi: 10.1186/s40246-019-0222-6.

Joint -Norm Constraint and Graph-Laplacian PCA Method for Feature Extraction.联合 - 范数约束和图拉普拉斯主成分分析方法的特征提取。

Biomed Res Int. 2017;2017:5073427. doi: 10.1155/2017/5073427. Epub 2017 Apr 2.

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

J Chem Inf Model. 2022 Apr 11;62(7):1794-1807. doi: 10.1021/acs.jcim.1c01403. Epub 2022 Mar 30.

引用本文的文献

Feature level quantitative ultrasound and CT information fusion to predict the outcome of head & neck cancer radiotherapy treatment: Enhanced principal component analysis.特征级定量超声与CT信息融合以预测头颈癌放疗治疗结果：增强主成分分析

Med Phys. 2025 Sep;52(9):e18078. doi: 10.1002/mp.18078.

Gene regulatory network inference using mixed-norms regularized multivariate model with covariance selection.基于协方差选择的混合范数正则化多元模型的基因调控网络推断

PLoS Comput Biol. 2023 Jul 31;19(7):e1010832. doi: 10.1371/journal.pcbi.1010832. eCollection 2023 Jul.

本文引用的文献

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.基于多视图生物数据的共特征基因选择和肿瘤分类的有监督判别稀疏 PCA

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.

Network analysis based on low-rank method for mining information on integrated data of multi-cancers.基于低秩方法的多癌种整合数据信息挖掘的网络分析。

Comput Biol Chem. 2019 Feb;78:468-473. doi: 10.1016/j.compbiolchem.2018.11.027. Epub 2018 Dec 3.

Laplacian regularized low-rank representation for cancer samples clustering.拉普拉斯正则化低秩表示在癌症样本聚类中的应用。

Comput Biol Chem. 2019 Feb;78:504-509. doi: 10.1016/j.compbiolchem.2018.11.003. Epub 2018 Nov 19.

A Mixed-Norm Laplacian Regularized Low-Rank Representation Method for Tumor Samples Clustering.一种基于混合范数拉普拉斯正则化的低秩表示方法在肿瘤样本聚类中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):172-182. doi: 10.1109/TCBB.2017.2769647. Epub 2017 Nov 3.

Robust Principal Component Analysis Regularized by Truncated Nuclear Norm for Identifying Differentially Expressed Genes.通过截断核范数正则化的稳健主成分分析用于识别差异表达基因

IEEE Trans Nanobioscience. 2017 Sep;16(6):447-454. doi: 10.1109/TNB.2017.2723439. Epub 2017 Jul 4.

Subspace Weighting Co-Clustering of Gene Expression Data.基于基因表达数据的子空间加权协同聚类。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Mar-Apr;16(2):352-364. doi: 10.1109/TCBB.2017.2705686. Epub 2017 May 18.

Robust and Efficient Biomolecular Clustering of Tumor Based on ${p}$ -Norm Singular Value Decomposition.基于p范数奇异值分解的稳健高效肿瘤生物分子聚类

IEEE Trans Nanobioscience. 2017 Jul;16(5):341-348. doi: 10.1109/TNB.2017.2705983. Epub 2017 May 18.

PCA Based on Graph Laplacian Regularization and P-Norm for Gene Selection and Clustering.基于图拉普拉斯正则化和P范数的主成分分析用于基因选择和聚类

IEEE Trans Nanobioscience. 2017 Jun;16(4):257-265. doi: 10.1109/TNB.2017.2690365. Epub 2017 Mar 31.

Coming of age: ten years of next-generation sequencing technologies.成年：下一代测序技术的十年

Nat Rev Genet. 2016 May 17;17(6):333-51. doi: 10.1038/nrg.2016.49.

Sparse group factor analysis for biclustering of multiple data sources.稀疏群组因子分析用于多数据源的双向聚类。

Bioinformatics. 2016 Aug 15;32(16):2457-63. doi: 10.1093/bioinformatics/btw207. Epub 2016 Apr 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于鲁棒肿瘤样本聚类和基因网络模块发现的联合Lp范数和L范数约束图拉普拉斯主成分分析

Joint Lp-Norm and L-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献