Tirrell Rob, Evani Uday, Berman Ari E, Mooney Sean D, Musen Mark A, Shah Nigam H
Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305.
AMIA Annu Symp Proc. 2010 Nov 13;2010:797-801.
Advanced statistical methods used to analyze high-throughput data (e.g. gene-expression assays) result in long lists of "significant genes." One way to gain insight into the significance of altered expression levels is to determine whether Gene Ontology (GO) terms associated with a particular biological process, molecular function, or cellular component are over- or under-represented in the set of genes deemed significant. This process, referred to as enrichment analysis, profiles a gene-set, and is relevant for and extensible to data analysis with other high-throughput measurement modalities such as proteomics, metabolomics, and tissue-microarray assays. With the availability of tools for automatic ontology-based annotation of datasets with terms from biomedical ontologies besides the GO, we need not restrict enrichment analysis to the GO. We describe, RANSUM - Rich Annotation Summarizer - which performs enrichment analysis using any ontology in the National Center for Biomedical Ontology's (NCBO) BioPortal. We outline the methodology of enrichment analysis, the associated challenges, and discuss novel analyses enabled by RANSUM.
用于分析高通量数据(如基因表达检测)的先进统计方法会产生一长串“显著基因”。深入了解表达水平改变的重要性的一种方法是确定与特定生物学过程、分子功能或细胞成分相关的基因本体论(GO)术语在被视为显著的基因集中是过度代表还是代表不足。这个过程被称为富集分析,它描绘了一个基因集,并且对于使用蛋白质组学、代谢组学和组织微阵列检测等其他高通量测量模式进行数据分析是相关的且可扩展的。除了GO之外,随着使用来自生物医学本体的术语对数据集进行基于本体的自动注释工具的出现,我们不必将富集分析局限于GO。我们描述了RANSUM——丰富注释汇总器——它使用美国国立医学图书馆(NCBO)生物门户中的任何本体进行富集分析。我们概述了富集分析的方法、相关挑战,并讨论了RANSUM实现的新颖分析。