Ochs Christopher, Perl Yehoshua, Halper Michael, Geller James, Lomax Jane
* Computer Science Department, New Jersey Institute of Technology Newark, NJ 07102, USA.
† Information Technology Department, New Jersey Institute of Technology Newark, NJ 07102, USA.
J Bioinform Comput Biol. 2016 Jun;14(3):1642001. doi: 10.1142/S0219720016420014. Epub 2015 Nov 24.
The gene ontology (GO) is used extensively in the field of genomics. Like other large and complex ontologies, quality assurance (QA) efforts for GO's content can be laborious and time consuming. Abstraction networks (AbNs) are summarization networks that reveal and highlight high-level structural and hierarchical aggregation patterns in an ontology. They have been shown to successfully support QA work in the context of various ontologies. Two kinds of AbNs, called the area taxonomy and the partial-area taxonomy, are developed for GO hierarchies and derived specifically for the biological process (BP) hierarchy. Within this framework, several QA heuristics, based on the identification of groups of anomalous terms which exhibit certain taxonomy-defined characteristics, are introduced. Such groups are expected to have higher error rates when compared to other terms. Thus, by focusing QA efforts on anomalous terms one would expect to find relatively more erroneous content. By automatically identifying these potential problem areas within an ontology, time and effort will be saved during manual reviews of GO's content. BP is used as a testbed, with samples of three kinds of anomalous BP terms chosen for a taxonomy-based QA review. Additional heuristics for QA are demonstrated. From the results of this QA effort, it is observed that different kinds of inconsistencies in the modeling of GO can be exposed with the use of the proposed heuristics. For comparison, the results of QA work on a sample of terms chosen from GO's general population are presented.
基因本体论(GO)在基因组学领域被广泛使用。与其他大型复杂本体一样,对GO内容进行质量保证(QA)工作可能既费力又耗时。抽象网络(AbN)是一种总结网络,它揭示并突出本体中的高层结构和层次聚合模式。在各种本体的背景下,它们已被证明能成功支持QA工作。针对GO层次结构开发了两种AbN,即区域分类法和部分区域分类法,并且是专门为生物过程(BP)层次结构派生的。在此框架内,引入了几种基于识别具有某些分类法定义特征的异常术语组的QA启发式方法。与其他术语相比,此类术语组预计具有更高的错误率。因此,通过将QA工作重点放在异常术语上,有望发现相对更多的错误内容。通过自动识别本体中的这些潜在问题区域,在对GO内容进行人工审核时将节省时间和精力。以BP作为测试平台,选择三种异常BP术语的样本进行基于分类法的QA审查。展示了用于QA的其他启发式方法。从这项QA工作的结果可以看出,使用所提出的启发式方法可以揭示GO建模中不同类型的不一致性。为了进行比较,还展示了从GO总体中选择的术语样本的QA工作结果。