Nat Biotechnol. 2010 Feb;28(2):128-30. doi: 10.1038/nbt0210-128.
Gene Ontology and similar biomedical ontologies are critical tools of today genetic research. These ontologies are crafted through a painstaking process of manual editing, and their organization relies on the intuition of human curators. Here we describe a method that uses information theory to automatically organize the structure of GO and optimize the distribution of the information within it. We used this approach to analyze the evolution of GO, and we identified several areas where the information was suboptimally organized. We optimized the structure of GO and used it to analyze 10,117 gene expression signatures. The use of this new version changed the functional interpretations of 97.5% (p < 10-3) of the signatures by, on average, 14.6%. As a result of this analysis, several changes will be introduced in the next releases of GO. We expect that these formal methods will become the standard to engineer biomedical ontologies.
基因本体论和类似的生物医学本体论是当今遗传研究的关键工具。这些本体论是通过艰苦的手动编辑过程精心制作的,其组织依赖于人类管理员的直觉。在这里,我们描述了一种使用信息论自动组织 GO 结构并优化其内部信息分布的方法。我们使用这种方法分析了 GO 的进化,并确定了信息组织不合理的几个领域。我们优化了 GO 的结构,并使用它分析了 10117 个基因表达特征。使用这个新版本,平均有 14.6%(p < 10-3)的特征的功能解释发生了变化。由于这项分析,GO 的下一个版本将进行一些更改。我们期望这些形式化方法将成为设计生物医学本体论的标准。