Bettembourg Charles, Diot Christian, Dameron Olivier
Université de Rennes 1, Rennes, France ; UMR1348 PEGASE, INRA, Saint-Gilles, France ; UMR1348 PEGASE, Agrocampus OUEST, Rennes, France ; IRISA, Campus de Beaulieu, Rennes, France ; INRIA, Rennes, France.
UMR1348 PEGASE, INRA, Saint-Gilles, France ; UMR1348 PEGASE, Agrocampus OUEST, Rennes, France.
PLoS One. 2014 Jan 28;9(1):e86525. doi: 10.1371/journal.pone.0086525. eCollection 2014.
Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity.
We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term's distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure.
Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.
遗传和基因组数据分析正在产出大量的基因集。这些基因集的功能比较是分析的关键部分,因为它能识别它们的共同功能以及区分每个基因集的功能。基因本体论(GO)计划为分析基因的分子功能、生物学过程和细胞成分提供了统一的参考。已经开发了许多语义相似性度量方法来系统地量化两个基因共享的GO术语的权重。我们研究了除了基因集相似性之外,考虑基因集特殊性如何能够改进基因集比较。
我们提出了一种基于GO术语所传达的信息来计算基因集特殊性的新方法。GO术语的信息性可以使用基于语料库中术语频率的信息内容来计算,或者使用该术语到根节点的距离的函数来计算。我们定义了一组GO术语Sg1相对于另一组GO术语Sg2的语义特殊性。我们将我们的特殊性度量与相似性度量相结合来比较基因集。我们证明,语义相似性和语义特殊性度量的结合能够从相似基因中识别出具有特定功能的基因。仅使用语义相似性度量无法识别这种差异。
语义特殊性应与语义相似性结合使用,以对GO注释的基因集进行功能分析。该原理可推广到其他本体。