Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/ Darwin 3, 28049 Madrid, Spain.
Bioinformatics. 2010 Feb 1;26(3):378-84. doi: 10.1093/bioinformatics/btp663. Epub 2009 Dec 4.
Gene Ontology (GO), the de facto standard for representing protein functional aspects, is being used beyond the primary goal for which it is designed: protein functional annotation. It is increasingly used to evaluate large sets of relationships between proteins, e.g. protein-protein interactions or mRNA co-expression, under the assumption that related proteins tend to have the same or similar GO terms. Nevertheless, this assumption only holds for terms representing functional groups with biological significance ('classes'), and not for the ones representing human-imposed aggregations or conceptualizations lacking a biological rationale ('categories').
Using a data-driven approach based on a set of high-quality functional associations, we quantify the functional coherence of GO biological process (GO:BP) terms as well as their explicit and implicit relationships, trying to distinguish classes and categories. We show that the quantification used is in agreement with the distinction one would intuitively make between these two concepts. As not all GO:BP terms and relationships are equally supported by current functional associations, any detailed validation of new experimental data using GO:BP, beyond whole-system statistics, should take such unbalance into account.
Supplementary data are available at Bioinformatics online.
基因本体论(GO)是表示蛋白质功能方面的事实上的标准,它的使用已经超出了其最初的设计目的:蛋白质功能注释。它越来越多地被用于评估蛋白质之间的大量关系,例如蛋白质-蛋白质相互作用或 mRNA 共表达,假设相关的蛋白质往往具有相同或相似的 GO 术语。然而,这种假设仅适用于代表具有生物学意义的功能组的术语(“类别”),而不适用于代表缺乏生物学依据的人为聚合或概念化的术语(“范畴”)。
我们使用了一种基于高质量功能关联集的基于数据驱动的方法,对 GO 生物过程(GO:BP)术语及其显式和隐式关系的功能一致性进行了量化,试图区分类别和范畴。我们表明,所使用的量化与这两个概念之间人们直观上的区别是一致的。由于并非所有的 GO:BP 术语和关系都同样得到当前功能关联的支持,因此,在使用 GO:BP 对新的实验数据进行任何详细验证时,除了全系统统计之外,都应该考虑到这种不平衡。
补充数据可在“生物信息学”在线获取。