Biotechnology Center, Technische Universität Dresden, 01062 Dresden, Germany.
Bioinformatics. 2010 Jun 15;26(12):i88-96. doi: 10.1093/bioinformatics/btq188.
Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration.
We have developed the Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG), a system which supports the creation and extension of OBO ontologies by semi-automatically generating terms, definitions and parent-child relations from text in PubMed, the web and PDF repositories. DOG4DAG is seamlessly integrated into OBO-Edit. It generates terms by identifying statistically significant noun phrases in text. For definitions and parent-child relations it employs pattern-based web searches. We systematically evaluate each generation step using manually validated benchmarks. The term generation leads to high-quality terms also found in manually created ontologies. Up to 78% of definitions are valid and up to 54% of child-ancestor relations can be retrieved. There is no other validated system that achieves comparable results. By combining the prediction of high-quality terms, definitions and parent-child relations with the ontology editor OBO-Edit we contribute a thoroughly validated tool for all OBO ontology engineers.
DOG4DAG is available within OBO-Edit 2.1 at http://www.oboedit.org.
Supplementary data are available at Bioinformatics online.
本体论和分类法已被证明对生物注释非常有益。仅开放生物医学本体论 (OBO) 基金会就列出了 90 多个本体论,这些本体论主要使用 OBO-Edit 构建。创建和维护这些本体论是一项劳动密集型、困难的、手动的过程。自动化其中的部分过程对于本体论的进一步发展和生物注释非常重要。
我们开发了用于有向无环图的德累斯顿本体生成器 (DOG4DAG),这是一个系统,通过从 PubMed、网络和 PDF 存储库中的文本半自动地生成术语、定义和父子关系,支持 OBO 本体的创建和扩展。DOG4DAG 无缝集成到 OBO-Edit 中。它通过识别文本中的统计上显著的名词短语来生成术语。对于定义和父子关系,它采用基于模式的网络搜索。我们使用手动验证的基准系统地评估每个生成步骤。术语生成产生了在手动创建的本体中也发现的高质量术语。高达 78%的定义是有效的,高达 54%的子-祖先关系可以检索到。没有其他经过验证的系统可以达到类似的结果。通过将高质量术语、定义和父子关系的预测与本体编辑器 OBO-Edit 结合起来,我们为所有 OBO 本体工程师贡献了一个经过彻底验证的工具。
DOG4DAG 可在 OBO-Edit 2.1 中使用,网址为 http://www.oboedit.org。
补充数据可在 Bioinformatics 在线获得。