Köhler Jacob, Munn Katherine, Rüegg Alexander, Skusa Andre, Smith Barry
Biomathematics and Bioinformatics Division, Rothamsted Research, Harpenden, UK.
BMC Bioinformatics. 2006 Apr 19;7:212. doi: 10.1186/1471-2105-7-212.
Ontologies and taxonomies are among the most important computational resources for molecular biology and bioinformatics. A series of recent papers has shown that the Gene Ontology (GO), the most prominent taxonomic resource in these fields, is marked by flaws of certain characteristic types, which flow from a failure to address basic ontological principles. As yet, no methods have been proposed which would allow ontology curators to pinpoint flawed terms or definitions in ontologies in a systematic way.
We present computational methods that automatically identify terms and definitions which are defined in a circular or unintelligible way. We further demonstrate the potential of these methods by applying them to isolate a subset of 6001 problematic GO terms. By automatically aligning GO with other ontologies and taxonomies we were able to propose alternative synonyms and definitions for some of these problematic terms. This allows us to demonstrate that these other resources do not contain definitions superior to those supplied by GO.
Our methods provide reliable indications of the quality of terms and definitions in ontologies and taxonomies. Further, they are well suited to assist ontology curators in drawing their attention to those terms that are ill-defined. We have further shown the limitations of ontology mapping and alignment in assisting ontology curators in rectifying problems, thus pointing to the need for manual curation.
本体论和分类法是分子生物学和生物信息学中最重要的计算资源之一。最近的一系列论文表明,基因本体论(GO)作为这些领域中最突出的分类资源,存在某些特定类型的缺陷,这些缺陷源于未能遵循基本的本体论原则。到目前为止,尚未提出能够使本体论管理者系统地找出本体论中有缺陷的术语或定义的方法。
我们提出了一些计算方法,这些方法可以自动识别以循环或难以理解的方式定义的术语和定义。我们通过将这些方法应用于分离出6001个有问题的GO术语的子集,进一步展示了这些方法的潜力。通过将GO与其他本体论和分类法自动对齐,我们能够为其中一些有问题的术语提出替代同义词和定义。这使我们能够证明,这些其他资源并不包含比GO提供的定义更好的定义。
我们的方法为本体论和分类法中术语和定义的质量提供了可靠的指示。此外,它们非常适合帮助本体论管理者关注那些定义不明确的术语。我们还展示了本体映射和对齐在帮助本体论管理者纠正问题方面的局限性,从而指出了人工编目的必要性。