Department of Computer Science and Engineering, Mississippi State University, MS, USA.
J Biomed Inform. 2013 Oct;46(5):849-56. doi: 10.1016/j.jbi.2013.06.012. Epub 2013 Jul 11.
The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships.
GO 是一套三个子本体,是用于描述基因产物特征的最流行的生物本体之一。GO 注释数据包含来自多个子本体和本体不同层次的术语,是三个子本体之间术语的隐含关系的重要来源。需要针对从多个本体和多个抽象层次进行挖掘的数据挖掘技术,如关联规则挖掘,才能从 GO 注释数据中进行有效的知识发现。我们提出了一种数据挖掘方法,即全层次多本体数据挖掘(MOAL),它利用 GO 的结构和关系来挖掘多本体多层次关联规则。我们引入了两种有趣性度量:多本体支持(MOSupport)和多本体置信度(MOConfidence),用于评估多本体多层次关联规则。我们还描述了各种用于修剪无趣规则的后处理策略。我们使用公共可用的 GO 注释数据来演示我们的方法在两个应用程序方面的应用(1)共同注释建议的发现和(2)新的跨本体关系的发现。