Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Bioinformatics. 2010 Jun 15;26(12):i71-8. doi: 10.1093/bioinformatics/btq173.
Recent years have seen the development of a wide range of biomedical ontologies. Notable among these is Sequence Ontology (SO) which offers a rich hierarchy of terms and relationships that can be used to annotate genomic data. Well-designed formal ontologies allow data to be reasoned upon in a consistent and logically sound way and can lead to the discovery of new relationships. The Semantic Web Rules Language (SWRL) augments the capabilities of a reasoner by allowing the creation of conditional rules. To date, however, formal reasoning, especially the use of SWRL rules, has not been widely used in biomedicine.
We have built a knowledge base of human pseudogenes, extending the existing SO framework to incorporate additional attributes. In particular, we have defined the relationships between pseudogenes and segmental duplications. We then created a series of logical rules using SWRL to answer research questions and to annotate our pseudogenes appropriately. Finally, we were left with a knowledge base which could be queried to discover information about human pseudogene evolution.
The fully populated knowledge base described in this document is available for download from http://ontology.pseudogene.org. A SPARQL endpoint from which to query the dataset is also available at this location.
近年来,出现了广泛的生物医学本体。其中值得注意的是序列本体(SO),它提供了丰富的术语和关系层次结构,可以用于注释基因组数据。精心设计的形式本体允许以一致且逻辑合理的方式对数据进行推理,并可以发现新的关系。语义 Web 规则语言(SWRL)通过允许创建条件规则来增强推理器的功能。然而,到目前为止,形式推理,特别是 SWRL 规则的使用,在生物医学中尚未得到广泛应用。
我们构建了一个人类假基因知识库,扩展了现有的 SO 框架以纳入其他属性。特别是,我们定义了假基因和片段重复之间的关系。然后,我们使用 SWRL 创建了一系列逻辑规则,以回答研究问题并适当地注释我们的假基因。最后,我们得到了一个可以查询以发现有关人类假基因进化信息的知识库。
本文档中描述的完全填充的知识库可从 http://ontology.pseudogene.org 下载。还可以在此位置获得用于查询数据集的 SPARQL 端点。