Vêncio Ricardo Z N, Koide Tie, Gomes Suely L, Pereira Carlos A de B
BIOINFO, Universidade de São Paulo, 05508-090 São Paulo, Brazil.
BMC Bioinformatics. 2006 Feb 23;7:86. doi: 10.1186/1471-2105-7-86.
The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem.
BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses.
The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.
在从微阵列实验获得的基因列表中寻找富集(又称过度代表或增强)的本体术语,正成为系统水平分析的标准程序。该程序试图聚焦于诸如基因本体、KEGG通路等分类设计来总结信息,而非聚焦于单个基因。尽管在统计学中关联和显著性是不同的概念这一点广为人知,但目前仅前一种方法被用于处理本体术语富集问题。
BayGO实现了一种贝叶斯方法,用于从微阵列数据中搜索富集术语。R源代码可在http://blasto.iq.usp.br/~tkoide/BayGO免费获取,有三个版本:Linux版本,可轻松整合到现有流程中;Windows版本,用于交互式控制;还有网络工具版本。该软件使用细菌热休克反应数据集进行了验证,因为这种应激会触发已知的系统水平反应。
贝叶斯模型考虑到了这样一个事实,即由于低强度信号、质量过滤、未点样的基因等原因,最终并非给定类别的所有基因在微阵列数据中都可观测到。此外,BayGO允许人们测量通用本体术语与差异表达之间的统计关联,而不是仅进行常见的显著性分析。