GO-PCA：一种利用先验知识探索基因表达数据的无监督方法。

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

作者信息

Wagner Florian

机构信息

Graduate Program in Computational Biology & Bioinformatics, Duke University, Durham, NC, United States of America.

Center for Genomic and Computational Biology, Duke University, Durham, NC, United States of America.

出版信息

PLoS One. 2015 Nov 17;10(11):e0143196. doi: 10.1371/journal.pone.0143196. eCollection 2015.

DOI:10.1371/journal.pone.0143196

PMID:26575370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4648502/

Abstract

METHOD

Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping.

RESULTS

I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

摘要

方法

全基因组表达谱分析是一种广泛应用于表征细胞、组织、活检样本或其他生物标本异质群体的方法。对此类数据的探索性分析通常依赖于通用的无监督方法，例如主成分分析（PCA）或层次聚类。然而，通用方法无法利用有关基因分子功能的先验知识。在此，我介绍了GO-PCA，这是一种将PCA与非参数GO富集分析相结合的无监督方法，以便系统地搜索既高度相关又在功能上密切相关的基因集。然后使用这些基因集自动生成带有功能标签的表达特征，其共同目的是提供生物学相关异同的易于解释的表示。所得结果的稳健性可通过自抽样法进行评估。

结果

我首先将GO-PCA分别应用于包含人类和小鼠不同造血细胞类型的数据集。在这两种情况下，GO-PCA都生成了少量代表大多数现有谱系的特征，其标签反映了它们各自的生物学特征。然后我将GO-PCA应用于人类胶质母细胞瘤（GBM）数据，并恢复了与先前定义的五种GBM亚型中的四种相关的特征。我的结果表明，GO-PCA是一种强大且通用的探索性方法，它将包含数千个基因的表达矩阵简化为一组小得多的可解释特征。通过这种方式，GO-PCA旨在促进假设生成、进一步分析的设计以及跨数据集的功能比较。

相似文献

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.GO-PCA：一种利用先验知识探索基因表达数据的无监督方法。

PLoS One. 2015 Nov 17;10(11):e0143196. doi: 10.1371/journal.pone.0143196. eCollection 2015.

BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.BubbleGUM：表型分子特征的自动提取及多种基因集富集分析的综合可视化

BMC Genomics. 2015 Oct 19;16:814. doi: 10.1186/s12864-015-2012-4.

How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.如何在基因集富集分析中确定哪些是最相关的过度表达特征。

BMC Bioinformatics. 2007 Sep 11;8:332. doi: 10.1186/1471-2105-8-332.

Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。

Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.

Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。

Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.

PLoS Genet. 2007 Sep;3(9):1672-86. doi: 10.1371/journal.pgen.0030160.

Identification of expression patterns in the progression of disease stages by integration of transcriptomic data.通过整合转录组数据识别疾病阶段进展中的表达模式。

BMC Bioinformatics. 2016 Nov 22;17(Suppl 15):432. doi: 10.1186/s12859-016-1290-4.

Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples.基因表达数据的非线性维数降低，用于癌症组织样本的可视化和聚类分析。

Comput Biol Med. 2010 Aug;40(8):723-32. doi: 10.1016/j.compbiomed.2010.06.007. Epub 2010 Jul 16.

Principal components analysis and the reported low intrinsic dimensionality of gene expression microarray data.主成分分析与基因表达微阵列数据所报道的低内在维度

Sci Rep. 2016 Jun 2;6:25696. doi: 10.1038/srep25696.

Interpretation of ANOVA models for microarray data using PCA.使用主成分分析（PCA）对微阵列数据的方差分析模型进行解释。

Bioinformatics. 2007 Jan 15;23(2):184-90. doi: 10.1093/bioinformatics/btl572. Epub 2006 Nov 14.

引用本文的文献

Inferring single-cell and spatial microRNA activity from transcriptomics data.从转录组学数据推断单细胞和空间微小RNA活性。

Commun Biol. 2025 Jan 18;8(1):87. doi: 10.1038/s42003-025-07454-9.

Reproductomics: Exploring the Applications and Advancements of Computational Tools.生殖组学：探索计算工具的应用和进展。

Physiol Res. 2024 Nov 12;73(5):687-702. doi: 10.33549/physiolres.935389.

Evolutionary and developmental specialization of foveal cell types in the marmoset.绒猴黄斑细胞类型的进化和发育特化。

Proc Natl Acad Sci U S A. 2024 Apr 16;121(16):e2313820121. doi: 10.1073/pnas.2313820121. Epub 2024 Apr 10.

A universal system for boosting gene expression in eukaryotic cell-lines.一种通用的提高真核细胞系中基因表达的系统。

Nat Commun. 2024 Mar 16;15(1):2394. doi: 10.1038/s41467-024-46573-5.

Representation and quantification of module activity from omics data with rROMA.使用 rROMA 从组学数据中表示和量化模块活性。

NPJ Syst Biol Appl. 2024 Jan 19;10(1):8. doi: 10.1038/s41540-024-00331-x.

Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors.双 DAP-seq 揭示了相互作用的 bZIP 转录因子的协同 DNA 结合。

Nat Commun. 2023 May 5;14(1):2600. doi: 10.1038/s41467-023-38096-2.

Adipocyte lysoplasmalogenase TMEM86A regulates plasmalogen homeostasis and protein kinase A-dependent energy metabolism.脂肪细胞溶酶体甘油磷脂酶 TMEM86A 调节甘油磷脂稳态和蛋白激酶 A 依赖性能量代谢。

Nat Commun. 2022 Jul 14;13(1):4084. doi: 10.1038/s41467-022-31805-3.

Lipidome profile predictive of disease evolution and activity in rheumatoid arthritis.类风湿关节炎中预测疾病进展和活动的脂质组图谱

Exp Mol Med. 2022 Feb;54(2):143-155. doi: 10.1038/s12276-022-00725-z. Epub 2022 Feb 15.

Histone Deacetylase Inhibition Regulates Lipid Homeostasis in a Mouse Model of Amyotrophic Lateral Sclerosis.组蛋白去乙酰化酶抑制在肌萎缩侧索硬化症小鼠模型中调节脂代谢平衡。

Int J Mol Sci. 2021 Oct 18;22(20):11224. doi: 10.3390/ijms222011224.

A deep dive into fat: Investigating blubber lipidomic fingerprint of killer whales and humpback whales in northern Norway.深入探究脂肪：调查挪威北部虎鲸和座头鲸的鲸脂脂质组指纹图谱。

Ecol Evol. 2021 May 1;11(11):6716-6729. doi: 10.1002/ece3.7523. eCollection 2021 Jun.

本文引用的文献

A nested parallel experiment demonstrates differences in intensity-dependence between RNA-seq and microarrays.一项嵌套平行实验证明了RNA测序与微阵列之间在强度依赖性方面的差异。

Nucleic Acids Res. 2015 Nov 16;43(20):e131. doi: 10.1093/nar/gkv636. Epub 2015 Jun 30.

Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.应用于胚胎干细胞的单细胞转录组学的液滴条形码技术。

Cell. 2015 May 21;161(5):1187-1201. doi: 10.1016/j.cell.2015.04.044.

Spatiotemporal transcriptomics reveals the evolutionary history of the endoderm germ layer.时空转录组学揭示了内胚层胚层的进化史。

Nature. 2015 Mar 12;519(7542):219-22. doi: 10.1038/nature13996. Epub 2014 Dec 10.

The GOA database: gene Ontology annotation updates for 2015.基因本体注释数据库（GOA）：2015年基因本体注释更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D1057-63. doi: 10.1093/nar/gku1113. Epub 2014 Nov 6.

Statistical significance of variables driving systematic variation in high-dimensional data.驱动高维数据系统变异的变量的统计学显著性。

Bioinformatics. 2015 Feb 15;31(4):545-54. doi: 10.1093/bioinformatics/btu674. Epub 2014 Oct 21.

Use of prior knowledge for the analysis of high-throughput transcriptomics and metabolomics data.利用先验知识分析高通量转录组学和代谢组学数据。

BMC Syst Biol. 2014;8 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-8-S2-S2. Epub 2014 Mar 13.

Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.单细胞 RNA 测序凸显原发性脑胶质瘤肿瘤内异质性。

Science. 2014 Jun 20;344(6190):1396-401. doi: 10.1126/science.1254257. Epub 2014 Jun 12.

Every cell is special: genome-wide studies add a new dimension to single-cell biology.每个细胞都是特殊的：全基因组研究为单细胞生物学增添了新维度。

Cell. 2014 Mar 27;157(1):8-11. doi: 10.1016/j.cell.2014.02.010.

Transcriptional control of early T and B cell developmental choices.早期T细胞和B细胞发育选择的转录调控。

Annu Rev Immunol. 2014;32:283-321. doi: 10.1146/annurev-immunol-032712-100024. Epub 2014 Jan 22.

The somatic genomic landscape of glioblastoma.胶质母细胞瘤的体细胞基因组景观。

Cell. 2013 Oct 10;155(2):462-77. doi: 10.1016/j.cell.2013.09.034.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GO-PCA：一种利用先验知识探索基因表达数据的无监督方法。

GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

作者信息

机构信息

出版信息

METHOD

RESULTS

方法

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献