Suppr超能文献

goSTAG:用于标记和注释一组基因的基因本体子树。

goSTAG: gene ontology subtrees to tag and annotate genes within a set.

作者信息

Bennett Brian D, Bushel Pierre R

机构信息

Integrative Bioinformatics Group, National Institute of Environmental Health Sciences, Research Triangle Park, 27709 NC USA.

Microarray and Genome Informatics Group, National Institute of Environmental Health Sciences, 111 T.W. Alexander Drive, Research Triangle Park, 27709 NC USA.

出版信息

Source Code Biol Med. 2017 Apr 13;12:6. doi: 10.1186/s13029-017-0066-1. eCollection 2017.

Abstract

BACKGROUND

Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories.

RESULTS

We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the -values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control.

CONCLUSIONS

goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG.

摘要

背景

过表达分析(ORA)可检测生物类别中基因的富集情况。基因本体论(GO)域常用于基因/基因产物注释。使用ORA时,每个基因集通常有数百个具有统计学意义的GO术语。在大量分析之间比较富集类别并在GO层次结构中识别具有最多连接的术语具有挑战性。此外,从富集类别的解释中确定代表样本的生物学主题可能具有高度主观性。

结果

我们开发了goSTAG,用于利用GO子树标记和注释属于某一集合的基因。给定来自微阵列、RNA测序(RNA-Seq)或其他基因组高通量技术的基因列表,goSTAG进行GO富集分析,并根据显著性检验的P值对GO术语进行聚类。为每个聚类构建GO子树,并使用子树中到根节点路径最多的术语作为生物学主题来标记和注释该聚类。我们在从接受癌症治疗药物的大鼠骨髓中获取的样本的微阵列基因表达数据集上测试了goSTAG,以确定联合用药或给药顺序是否在基因表达水平上影响骨髓毒性。几个聚类用来自子树的GO生物学过程(BP)标记,这些子树指示了用奥沙利铂/拓扑替康联合治疗的动物骨髓中一些突出的被调节途径。特别是,丝裂原活化蛋白激酶活性的负调控是在用奥沙利铂治疗后6小时接着用对照处理后与富集相关的聚类中唯一的生物学主题。然而,核苷三磷酸分解代谢过程是在用拓扑替康治疗后6小时接着用对照处理后唯一标记的GO BP。

结论

goSTAG通过富集生物学类别并从聚类中过度代表的术语构建GO子树,将基因组分析中的基因列表转化为生物学主题。子树中到根节点路径最多的术语用于代表生物学主题。goSTAG是用R语言作为一个生物导体包开发的,可在https://bioconductor.org/packages/goSTAG获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4301/5390446/db887a31f4ec/13029_2017_66_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验