Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America.
PLoS Comput Biol. 2012;8(2):e1002386. doi: 10.1371/journal.pcbi.1002386. Epub 2012 Feb 16.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the "functional similarity" between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the "ortholog conjecture" (or, more properly, the "ortholog functional conservation hypothesis"). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an "open world assumption" (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.
最近的一篇论文(Nehrt 等人,《公共科学图书馆计算生物学》7:e1002073,2011 年)提出了一种仅使用直接从已发表的实验结果中得出的基因本体论(GO)注释来衡量两个基因之间“功能相似性”的方法。应用该方法,作者得出结论,与这些基因组之间的同源基因相比,小鼠基因组或人类基因组中的旁系同源基因平均更具有功能相似性,如果这一结果真实,这将是一个具有广泛影响的意外结果。我们基于理论和经验的考虑,提出该方法不应被解释为功能相似性,因此不能用于支持任何关于“同源假说”(或者更准确地说,“同源功能保守假说”)的结论。首先,我们重新审视了 Nehrt 等人提出的作为功能差异的同源基因的案例研究,并得出了截然不同的结论:它们实际上例证了 GO 注释如何为同源基因提供有关保守生物功能的补充信息。然后,我们表明,基于实验的人类和小鼠基因的 GO 注释存在全局确证偏差:特定类型的实验往往在不同的模型生物中进行。我们得出的结论是,报告的同源基因对注释之间的统计差异并不反映生物学功能的差异,而是实验方法的互补性。我们的研究结果强调了基于 GO 提出新类型分析的研究人员应考虑的两个一般性问题:1)GO 注释通常是不完整的,可能具有偏向性,并受“开放世界假设”(没有注释并不意味着没有功能)的影响;2)从新颖的大规模 GO 分析中得出的结论应尽可能通过仔细、深入地检查示例来支持,以帮助确保结论具有合理的生物学基础。