Suppr超能文献

通过父子分析改进基因本体注释过度代表性的检测。

Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis.

作者信息

Grossmann Steffen, Bauer Sebastian, Robinson Peter N, Vingron Martin

机构信息

Max-Planck-Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.

出版信息

Bioinformatics. 2007 Nov 15;23(22):3024-31. doi: 10.1093/bioinformatics/btm440. Epub 2007 Sep 11.

Abstract

MOTIVATION

High-throughput experiments such as microarray hybridizations often yield long lists of genes found to share a certain characteristic such as differential expression. Exploring Gene Ontology (GO) annotations for such lists of genes has become a widespread practice to get first insights into the potential biological meaning of the experiment. The standard statistical approach to measuring overrepresentation of GO terms cannot cope with the dependencies resulting from the structure of GO because they analyze each term in isolation. Especially the fact that annotations are inherited from more specific descendant terms can result in certain types of false-positive results with potentially misleading biological interpretation, a phenomenon which we term the inheritance problem.

RESULTS

We present here a novel approach to analysis of GO term overrepresentation that determines overrepresentation of terms in the context of annotations to the term's parents. This approach reduces the dependencies between the individual term's measurements, and thereby avoids producing false-positive results owing to the inheritance problem. ROC analysis using study sets with overrepresented GO terms showed a clear advantage for our approach over the standard algorithm with respect to the inheritance problem. Although there can be no gold standard for exploratory methods such as analysis of GO term overrepresentation, analysis of biological datasets suggests that our algorithm tends to identify the core GO terms that are most characteristic of the dataset being analyzed.

摘要

动机

诸如微阵列杂交等高通量实验常常会产生一长串被发现具有某种共同特征(如差异表达)的基因列表。探索这些基因列表的基因本体论(GO)注释已成为一种广泛采用的做法,以便初步了解实验潜在的生物学意义。测量GO术语过度代表性的标准统计方法无法应对因GO结构而产生的依赖性,因为它们是孤立地分析每个术语。特别是注释从更具体的后代术语继承这一事实,可能会导致某些类型的假阳性结果,并产生潜在的误导性生物学解释,我们将这种现象称为继承问题。

结果

我们在此提出一种分析GO术语过度代表性的新方法,该方法在术语的父级注释背景下确定术语的过度代表性。这种方法减少了各个术语测量之间的依赖性,从而避免了因继承问题而产生假阳性结果。使用具有过度代表性GO术语的研究集进行的ROC分析表明,相对于继承问题,我们的方法比标准算法具有明显优势。尽管对于诸如GO术语过度代表性分析等探索性方法不存在金标准,但对生物数据集的分析表明,我们的算法倾向于识别出最能表征所分析数据集特征的核心GO术语。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验