将生物信息纳入稀疏主成分分析并应用于基因组数据。

Incorporating biological information in sparse principal component analysis with application to genomic data.

作者信息

Li Ziyi, Safo Sandra E, Long Qi

机构信息

Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Road, Atlanta, 30322, GA, USA.

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, 19104, PA, USA.

出版信息

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

DOI:10.1186/s12859-017-1740-7

PMID:28697740

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5504598/

Abstract

BACKGROUND

Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection.

RESULTS

Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma.

CONCLUSIONS

The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.

摘要

背景

稀疏主成分分析（PCA）是一种用于高维数据降维、模式识别和可视化的常用工具。人们已经认识到，复杂的生物学机制是通过多个基因在通常由图表示的网络中协同作用的关系发生的。最近的研究表明，在回归分析中纳入此类生物学信息可提高特征选择和预测性能，但将这种方法扩展到PCA的研究还很有限。在本文中，我们提出了两种新的稀疏PCA方法，即融合稀疏PCA和分组稀疏PCA，它们能够在变量选择中纳入先验生物学信息。

结果

我们的模拟研究表明，与现有的稀疏PCA方法相比，当图结构正确指定时，所提出的方法具有更高的灵敏度和特异性，并且对错误指定的图结构具有相当的鲁棒性。应用于胶质母细胞瘤基因表达数据集，识别出了文献中提示与胶质母细胞瘤相关的通路。

结论

所提出的稀疏PCA方法，即融合稀疏PCA和分组稀疏PCA，能够在变量选择中有效地纳入先验生物学信息，从而改善特征选择，使主成分载荷更易于解释，并有可能为复杂疾病的分子基础提供见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a72/5504598/9a3e3bca4032/12859_2017_1740_Fig1_HTML.jpg

相似文献

Incorporating biological information in sparse principal component analysis with application to genomic data.将生物信息纳入稀疏主成分分析并应用于基因组数据。

BMC Bioinformatics. 2017 Jul 11;18(1):332. doi: 10.1186/s12859-017-1740-7.

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

J Chem Inf Model. 2022 Apr 11;62(7):1794-1807. doi: 10.1021/acs.jcim.1c01403. Epub 2022 Mar 30.

Edge-group sparse PCA for network-guided high dimensional data analysis.基于边缘群稀疏 PCA 的网络引导高维数据分析。

Bioinformatics. 2018 Oct 15;34(20):3479-3487. doi: 10.1093/bioinformatics/bty362.

Principal Component Analysis Based on Graph Laplacian and Double Sparse Constraints for Feature Selection and Sample Clustering on Multi-View Data.基于图拉普拉斯算子和双稀疏约束的主成分分析用于多视图数据的特征选择和样本聚类

Hum Hered. 2019;84(1):47-58. doi: 10.1159/000501653. Epub 2019 Aug 29.

Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components.基于分块稀疏主成分的多组学数据综合分析。

Int J Mol Sci. 2020 Nov 2;21(21):8202. doi: 10.3390/ijms21218202.

Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data.基于多视图生物数据的共特征基因选择和肿瘤分类的有监督判别稀疏 PCA

IEEE Trans Neural Netw Learn Syst. 2019 Oct;30(10):2926-2937. doi: 10.1109/TNNLS.2019.2893190. Epub 2019 Feb 22.

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data.应用稳定性选择方法在高维分子数据中一致估计稀疏主成分。

Bioinformatics. 2015 Aug 15;31(16):2683-90. doi: 10.1093/bioinformatics/btv197. Epub 2015 Apr 10.

Super-sparse principal component analyses for high-throughput genomic data.超高通量基因组数据的超稀疏主成分分析。

BMC Bioinformatics. 2010 Jun 2;11:296. doi: 10.1186/1471-2105-11-296.

Principal component analysis based methods in bioinformatics studies.基于主成分分析的生物信息学研究方法。

Brief Bioinform. 2011 Nov;12(6):714-22. doi: 10.1093/bib/bbq090. Epub 2011 Jan 17.

Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets.独立主成分分析在大型生物数据集的生物学有意义的降维中的应用。

BMC Bioinformatics. 2012 Feb 3;13:24. doi: 10.1186/1471-2105-13-24.

引用本文的文献

Identification of a C2H2 zinc finger-related lncRNA prognostic signature and its association with the immune microenvironment in clear cell renal cell carcinoma.C2H2锌指相关长链非编码RNA预后特征的鉴定及其与透明细胞肾细胞癌免疫微环境的关系

Transl Androl Urol. 2025 Feb 28;14(2):412-431. doi: 10.21037/tau-2024-769. Epub 2025 Feb 25.

AWGE-ESPCA: An edge sparse PCA model based on adaptive noise elimination regularization and weighted gene network for Hermetia illucens genomic data analysis.AWGE-ESPCA：一种基于自适应噪声消除正则化和加权基因网络的边缘稀疏主成分分析模型，用于黑水虻基因组数据分析。

PLoS Comput Biol. 2025 Feb 13;21(2):e1012773. doi: 10.1371/journal.pcbi.1012773. eCollection 2025 Feb.

PGC-1α regulates the interplay between oxidative stress, senescence and autophagy in the ageing retina important in age-related macular degeneration.PGC-1α 调节氧化应激、衰老和自噬在衰老视网膜中的相互作用，这在年龄相关性黄斑变性中很重要。

J Cell Mol Med. 2024 Apr;28(8):e18051. doi: 10.1111/jcmm.18051.

Single-cell biclustering for cell-specific transcriptomic perturbation detection in AD progression.用于检测阿尔茨海默病进展中细胞特异性转录组扰动的单细胞双聚类分析

Cell Rep Methods. 2024 Apr 22;4(4):100742. doi: 10.1016/j.crmeth.2024.100742. Epub 2024 Mar 29.

Immunological characteristics of immunogenic cell death genes and malignant progression driving roles of TLR4 in anaplastic thyroid carcinoma.免疫原性细胞死亡基因的免疫学特征及 TLR4 在间变性甲状腺癌恶性进展中的驱动作用。

BMC Cancer. 2023 Nov 21;23(1):1131. doi: 10.1186/s12885-023-11647-y.

Identification of N7-methylguanosine-related lncRNAs for the risk stratification of hepatocellular carcinoma.用于肝细胞癌风险分层的N7-甲基鸟苷相关长链非编码RNA的鉴定

J Gastrointest Oncol. 2023 Jun 30;14(3):1392-1411. doi: 10.21037/jgo-23-227. Epub 2023 Jun 7.

Identification and validation of fatty acid metabolism-related lncRNA signatures as a novel prognostic model for clear cell renal cell carcinoma.鉴定和验证脂肪酸代谢相关的 lncRNA 特征作为透明细胞肾细胞癌的一种新的预后模型。

Sci Rep. 2023 Apr 29;13(1):7043. doi: 10.1038/s41598-023-34027-9.

A new CCCH-type zinc finger-related lncRNA signature predicts the prognosis of clear cell renal cell carcinoma patients.一种新的CCCH型锌指相关长链非编码RNA特征可预测透明细胞肾细胞癌患者的预后。

Front Genet. 2022 Sep 30;13:1034567. doi: 10.3389/fgene.2022.1034567. eCollection 2022.

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

Identification of a glycolysis-related lncRNA prognostic signature for clear cell renal cell carcinoma.鉴定透明细胞肾细胞癌中与糖酵解相关的 lncRNA 预后特征。

Biosci Rep. 2021 Aug 27;41(8). doi: 10.1042/BSR20211451.

本文引用的文献

Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data.稀疏广义特征值问题及其在甲基化与基因表达数据综合分析的典型相关分析中的应用

Biometrics. 2018 Dec;74(4):1362-1371. doi: 10.1111/biom.12886. Epub 2018 May 11.

Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence.结合已知和新生物信息的分层特征选择：识别与前列腺癌复发相关的基因组特征。

J Am Stat Assoc. 2016;111(516):1427-1439. doi: 10.1080/01621459.2016.1164051. Epub 2017 Jan 4.

Statistical challenges in analyzing methylation and long-range chromosomal interaction data.分析甲基化和长程染色体相互作用数据时的统计学挑战。

Stat Biosci. 2016 Oct;8(2):284-309. doi: 10.1007/s12561-016-9145-0. Epub 2016 Mar 7.

PANTHER version 10: expanded protein families and functions, and analysis tools.PANTHER 版本 10：扩展的蛋白质家族与功能以及分析工具。

Nucleic Acids Res. 2016 Jan 4;44(D1):D336-42. doi: 10.1093/nar/gkv1194. Epub 2015 Nov 17.

The proneural molecular signature is enriched in oligodendrogliomas and predicts improved survival among diffuse gliomas.前体细胞分子特征在少突胶质细胞瘤中富集，并可预测弥漫性神经胶质瘤患者的生存改善。

PLoS One. 2010 Sep 3;5(9):e12548. doi: 10.1371/journal.pone.0012548.

Tackling the widespread and critical impact of batch effects in high-throughput data.解决高通量数据中广泛存在且极具影响力的批次效应问题。

Nat Rev Genet. 2010 Oct;11(10):733-9. doi: 10.1038/nrg2825. Epub 2010 Sep 14.

Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1.整合基因组分析确定了具有 PDGFRA、IDH1、EGFR 和 NF1 异常的胶质母细胞瘤的临床相关亚型。

Cancer Cell. 2010 Jan 19;17(1):98-110. doi: 10.1016/j.ccr.2009.12.020.

Causes and consequences of microRNA dysregulation in cancer.癌症中微小RNA失调的原因及后果。

Nat Rev Genet. 2009 Oct;10(10):704-14. doi: 10.1038/nrg2634.

Incorporating predictor network in penalized regression with application to microarray data.将预测网络纳入惩罚回归并应用于微阵列数据。

Biometrics. 2010 Jun;66(2):474-84. doi: 10.1111/j.1541-0420.2009.01296.x. Epub 2009 Jul 23.

Extensions of sparse canonical correlation analysis with applications to genomic data.稀疏典型相关分析的扩展及其在基因组数据中的应用

Stat Appl Genet Mol Biol. 2009;8(1):Article28. doi: 10.2202/1544-6115.1470. Epub 2009 Jun 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

将生物信息纳入稀疏主成分分析并应用于基因组数据。

Incorporating biological information in sparse principal component analysis with application to genomic data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献