Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS, USA.
PLoS One. 2012;7(10):e47411. doi: 10.1371/journal.pone.0047411. Epub 2012 Oct 12.
The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.
基因本体论(GO)已成为表示基因产物功能、过程和位置方面的国际公认标准。丰富的 GO 注释数据为这些方面之间的隐含关系提供了有价值的隐性知识来源。我们描述了一种新的关联规则挖掘方法,用于在多个抽象层次上发现 GO 子本体之间隐含的共现关系。GO 中的关联规则挖掘先前的工作主要集中在挖掘单个抽象层次上的知识和/或来自同一子本体的术语之间的知识。我们开发了一种自下而上的泛化过程,称为跨本体数据挖掘-逐层(COLL),它考虑了 GO 的结构和语义,从注释数据生成广义事务,并挖掘有趣的多层次跨本体关联规则。我们将我们的方法应用于公开可用的鸡和鼠 GO 注释数据集,并分别从两个数据集挖掘了 5368 个和 3959 个多层次的跨本体规则。与以前发表的方法相比,我们的方法通过生物学家的评估,从 GO 中发现了更多和更高质量的关联规则。我们的方法发现的具有生物学意义的规则揭示了关于共现 GO 术语的未知和令人惊讶的知识。