基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.

Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.

出版信息

J Chem Inf Model. 2022 Apr 11;62(7):1794-1807. doi: 10.1021/acs.jcim.1c01403. Epub 2022 Mar 30.

DOI:10.1021/acs.jcim.1c01403

PMID:35353532

Abstract

Characteristic gene selection and tumor classification of gene expression data play major roles in genomic research. Due to the characteristics of a small sample size and high dimensionality of gene expression data, it is a common practice to perform dimensionality reduction prior to the use of machine learning-based methods to analyze the expression data. In this context, classical principal component analysis (PCA) and its improved versions have been widely used. Recently, methods based on supervised discriminative sparse PCA have been developed to improve the performance of data dimensionality reduction. However, such methods still have limitations: most of them have not taken into consideration the improvement of robustness to outliers and noise, label information, sparsity, as well as capturing intrinsic geometrical structures in one objective function. To address this drawback, in this study, we propose a novel PCA-based method, known as the robust Laplacian supervised discriminative sparse PCA, termed RLSDSPCA, which enforces the L2,1 norm on the error function and incorporates the graph Laplacian into supervised discriminative sparse PCA. To evaluate the efficacy of the proposed RLSDSPCA, we applied it to the problems of characteristic gene selection and tumor classification problems using gene expression data. The results demonstrate that the proposed RLSDSPCA method, when used in combination with other related methods, can effectively identify new pathogenic genes associated with diseases. In addition, RLSDSPCA has also achieved the best performance compared with the state-of-the-art methods on tumor classification in terms of major performance metrics. The codes and data sets used in the study are freely available at http://csbio.njust.edu.cn/bioinf/rlsdspca/.

摘要

特征基因选择和基因表达数据的肿瘤分类在基因组研究中起着重要作用。由于基因表达数据的样本量小和维度高的特点，在使用基于机器学习的方法分析表达数据之前，通常需要进行降维。在这种情况下，经典的主成分分析（PCA）及其改进版本得到了广泛的应用。最近，基于有监督判别稀疏 PCA 的方法已经被开发出来，以提高数据降维的性能。然而，这些方法仍然存在局限性：大多数方法都没有考虑到提高对离群值和噪声、标签信息、稀疏性以及在一个目标函数中捕获内在几何结构的鲁棒性。为了解决这个缺点，在本研究中，我们提出了一种新的基于 PCA 的方法，称为鲁棒拉普拉斯监督判别稀疏 PCA，称为 RLSDSPCA，它在误差函数上施加 L2,1 范数，并将图拉普拉斯纳入到监督判别稀疏 PCA 中。为了评估所提出的 RLSDSPCA 的有效性，我们将其应用于使用基因表达数据进行特征基因选择和肿瘤分类问题。结果表明，所提出的 RLSDSPCA 方法与其他相关方法相结合，可以有效地识别与疾病相关的新致病基因。此外，RLSDSPCA 在肿瘤分类方面的主要性能指标上也优于最新方法。研究中使用的代码和数据集可在 http://csbio.njust.edu.cn/bioinf/rlsdspca/ 上免费获取。

相似文献

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

J Chem Inf Model. 2022 Apr 11;62(7):1794-1807. doi: 10.1021/acs.jcim.1c01403. Epub 2022 Mar 30.

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.基于多视图生物数据的共特征基因选择和肿瘤分类的有监督判别稀疏 PCA

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.

Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data.基于图拉普拉斯算子和双稀疏约束的主成分分析用于多视图数据的特征选择和样本聚类

Hum Hered. 2019;84(1):47-58. doi: 10.1159/000501653. Epub 2019 Aug 29.

Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Joint -Norm Constraint and Graph-Laplacian PCA Method for Feature Extraction.联合 - 范数约束和图拉普拉斯主成分分析方法的特征提取。

Biomed Res Int. 2017;2017:5073427. doi: 10.1155/2017/5073427. Epub 2017 Apr 2.

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

Joint Lp-Norm and L-Norm Constrained Graph Laplacian PCA for Robust Tumor Sample Clustering and Gene Network Module Discovery.用于鲁棒肿瘤样本聚类和基因网络模块发现的联合Lp范数和L范数约束图拉普拉斯主成分分析

Front Genet. 2021 Feb 23;12:621317. doi: 10.3389/fgene.2021.621317. eCollection 2021.

PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis.PLPCA：用于微阵列数据分析的持久拉普拉斯增强主成分分析。

J Chem Inf Model. 2024 Apr 8;64(7):2405-2420. doi: 10.1021/acs.jcim.3c01023. Epub 2023 Sep 22.

PCA Based on Graph Laplacian Regularization and P-Norm for Gene Selection and Clustering.基于图拉普拉斯正则化和P范数的主成分分析用于基因选择和聚类

IEEE Trans Nanobioscience. 2017 Jun;16(4):257-265. doi: 10.1109/TNB.2017.2690365. Epub 2017 Mar 31.

PCA via joint graph Laplacian and sparse constraint: Identification of differentially expressed genes and sample clustering on gene expression data.基于联合图拉普拉斯和稀疏约束的主成分分析：在基因表达数据上进行差异表达基因的识别和样本聚类。

BMC Bioinformatics. 2019 Dec 30;20(Suppl 22):716. doi: 10.1186/s12859-019-3229-z.

引用本文的文献

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis.PLPCA：用于微阵列数据分析的持久拉普拉斯增强主成分分析。

J Chem Inf Model. 2024 Apr 8;64(7):2405-2420. doi: 10.1021/acs.jcim.3c01023. Epub 2023 Sep 22.

Quantitative Detection of Gastrointestinal Tumor Markers Using a Machine Learning Algorithm and Multicolor Quantum Dot Biosensor.基于机器学习算法和多色量子点生物传感器的胃肠道肿瘤标志物定量检测。

Comput Intell Neurosci. 2022 Sep 1;2022:9022821. doi: 10.1155/2022/9022821. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献