文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

PCA2GO:一种新的基于多元统计的方法,用于识别高表达的 GO 术语。

PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms.

机构信息

Department of Cardiac Development and Remodelling, Max-Planck-Institute for Heart and Lung Research, Bad Nauheim, Germany.

出版信息

BMC Bioinformatics. 2010 Jun 21;11:336. doi: 10.1186/1471-2105-11-336.


DOI:10.1186/1471-2105-11-336
PMID:20565932
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2910024/
Abstract

BACKGROUND: Several tools have been developed to explore and search Gene Ontology (GO) databases allowing efficient GO enrichment analysis and GO tree visualization. Nevertheless, identification of highly specific GO-terms in complex data sets is relatively complicated and the display of GO term assignments and GO enrichment analysis by simple tables or pie charts is not optimal. Valuable information such as the hierarchical position of a single GO term within the GO tree (topological ordering), or enrichment within a complex set of biological experiments is not displayed. Pie charts based on GO tree levels are, themselves, one-dimensional graphs, which cannot properly or efficiently represent the hierarchical specificity for the biological system being studied. RESULTS: Here we present a new method, which we name PCA2GO, capable of GO analysis using complex multidimensional experimental settings. We employed principal component analysis (PCA) and developed a new score, which takes into account the relative frequency of certain GO terms and their specificity (hierarchical position) within the GO graph. We evaluated the correlation between our representation score R and a standard measure of enrichment, namely p-values to convey the versatility of our approach to other methods and point out differences between our method and commonly used enrichment analyses. Although p values and the R score formally measure different quantities they should be correlated, because relative frequencies of GO terms occurrences within a dataset are an indirect measure of protein numbers related to this term. Therefore they are also related to enrichment. We showed that our score enables us to identify more specific GO-terms i.e. those positioned further down the GO-graph than other common tools used for this purpose. PCA2GO allows visualization and detection of multidimensional dependencies both within the acyclic graph (GO tree) and the experimental settings. Our method is intended for the analysis of several experimental sets, not for one set, like standard enrichment tools. To demonstrate the usefulness of our approach we performed a PCA2GO analysis of a fractionated cardiomyocyte protein dataset, which was identified by enhanced liquid chromatography-mass spectrometry (GeLC-MS). The analysis enabled us to detect distinct groups of proteins, which accurately reflect properties of biochemical cell fractions. CONCLUSIONS: We conclude that PCA2GO is an alternative efficient GO analysis tool with unique features for detection and visualization of multidimensional dependencies within the dataset under study. PCA2GO reveals strongly correlated GO terms within the experimental setting (in this case different fractions) by PCA group formation and improves detection of more specific GO terms within experiment dependent GO term groups than standard p value calculations.

摘要

背景:已经开发了几种工具来探索和搜索基因本体论 (GO) 数据库,允许进行有效的 GO 富集分析和 GO 树可视化。然而,在复杂的数据集 中识别高度特定的 GO 术语相对复杂,并且通过简单的表格或饼图显示 GO 术语分配和 GO 富集分析并不理想。 诸如单个 GO 术语在 GO 树中的层次位置(拓扑排序)或在复杂的生物实验集中的富集等有价值的信息未显示。基于 GO 树级别的饼图本身就是一维图,不能很好或有效地表示正在研究的生物系统的层次特异性。

结果:在这里,我们提出了一种新方法,我们称之为 PCA2GO,能够使用复杂的多维实验设置进行 GO 分析。我们采用主成分分析 (PCA) 并开发了一种新的得分,该得分考虑了特定 GO 术语的相对频率及其在 GO 图中的特异性(层次位置)。我们评估了我们的表示得分 R 与标准富集度量(即 p 值)之间的相关性,以传达我们的方法对其他方法的多功能性,并指出我们的方法与常用富集分析之间的差异。尽管 p 值和 R 得分形式上测量不同的量,但它们应该相关,因为数据集内 GO 术语出现的相对频率是与该术语相关的蛋白质数量的间接度量。因此,它们也与富集有关。我们表明,我们的得分使我们能够识别更具体的 GO 术语,即那些位于 GO 图中比其他常用工具更远的位置。PCA2GO 允许在非循环图(GO 树)和实验设置内可视化和检测多维依赖关系。我们的方法用于分析多个实验集,而不是像标准富集工具那样用于一个集。为了演示我们方法的有用性,我们对通过增强型液相色谱-质谱 (GeLC-MS) 鉴定的部分心肌细胞蛋白质数据集进行了 PCA2GO 分析。该分析使我们能够检测到准确反映生化细胞分数特性的不同蛋白质组。

结论:我们得出的结论是,PCA2GO 是一种替代的高效 GO 分析工具,具有独特的功能,可用于检测和可视化研究中数据集内的多维依赖关系。PCA2GO 通过 PCA 分组形成来揭示实验设置内(在这种情况下为不同的分数)强相关的 GO 术语,并比标准 p 值计算更有效地检测到实验相关 GO 术语组内更具体的 GO 术语。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/08eb8f4f245f/1471-2105-11-336-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/6a018a869f70/1471-2105-11-336-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/d020dbd7226a/1471-2105-11-336-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/b8b56154e000/1471-2105-11-336-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/08eb8f4f245f/1471-2105-11-336-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/6a018a869f70/1471-2105-11-336-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/d020dbd7226a/1471-2105-11-336-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/b8b56154e000/1471-2105-11-336-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6b6/2910024/08eb8f4f245f/1471-2105-11-336-4.jpg

相似文献

[1]
PCA2GO: a new multivariate statistics based method to identify highly expressed GO-Terms.

BMC Bioinformatics. 2010-6-21

[2]
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

BMC Bioinformatics. 2009-2-3

[3]
How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.

BMC Bioinformatics. 2007-9-11

[4]
Using OWL reasoning to support the generation of novel gene sets for enrichment analysis.

J Biomed Semantics. 2018-2-14

[5]
GOurmet: a tool for quantitative comparison and visualization of gene expression profiles based on gene ontology (GO) distributions.

BMC Bioinformatics. 2006-3-17

[6]
GO PaD: the Gene Ontology Partition Database.

Nucleic Acids Res. 2007-1

[7]
GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.

Nucleic Acids Res. 2008-7-1

[8]
Graph-based exploitation of gene ontology using GOxploreR for scrutinizing biological significance.

Sci Rep. 2020-10-7

[9]
Onto-CC: a web server for identifying Gene Ontology conceptual clusters.

Nucleic Acids Res. 2008-7-1

[10]
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

2015

引用本文的文献

[1]
Differences in gene expression profiles in early and late stage rhodesiense HAT individuals in Malawi.

PLoS Negl Trop Dis. 2023-12

[2]
pcaGoPromoter--an R package for biological and regulatory interpretation of principal components in genome-wide gene expression data.

PLoS One. 2012-2-27

本文引用的文献

[1]
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

BMC Bioinformatics. 2009-2-3

[2]
SILAC mouse for quantitative proteomics uncovers kindlin-3 as an essential factor for red blood cell function.

Cell. 2008-7-25

[3]
Blast2GO: A comprehensive suite for functional analysis in plant genomics.

Int J Plant Genomics. 2008

[4]
Different autonomous myogenic cell populations revealed by ablation of Myf5-expressing cells during mouse embryogenesis.

Development. 2008-5

[5]
Stable isotope labeling by amino acids in cell culture (SILAC) and proteome quantitation of mouse embryonic stem cells to a depth of 5,111 proteins.

Mol Cell Proteomics. 2008-4

[6]
Is proteomics the new genomics?

Cell. 2007-8-10

[7]
FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments.

Nucleic Acids Res. 2007-7

[8]
Ontological analysis of gene expression data: current tools, limitations, and open problems.

Bioinformatics. 2005-9-15

[9]
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks.

Bioinformatics. 2005-8-15

[10]
The International Protein Index: an integrated database for proteomics experiments.

Proteomics. 2004-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索