Midford Peter E, Dececchi Thomas Alex, Balhoff James P, Dahdul Wasila M, Ibrahim Nizar, Lapp Hilmar, Lundberg John G, Mabee Paula M, Sereno Paul C, Westerfield Monte, Vision Todd J, Blackburn David C
Department of Vertebrate Zoology and Anthropology, California Academy of Sciences, San Francisco, California, USA.
J Biomed Semantics. 2013 Nov 22;4(1):34. doi: 10.1186/2041-1480-4-34.
A hierarchical taxonomy of organisms is a prerequisite for semantic integration of biodiversity data. Ideally, there would be a single, expansive, authoritative taxonomy that includes extinct and extant taxa, information on synonyms and common names, and monophyletic supraspecific taxa that reflect our current understanding of phylogenetic relationships.
As a step towards development of such a resource, and to enable large-scale integration of phenotypic data across vertebrates, we created the Vertebrate Taxonomy Ontology (VTO), a semantically defined taxonomic resource derived from the integration of existing taxonomic compilations, and freely distributed under a Creative Commons Zero (CC0) public domain waiver. The VTO includes both extant and extinct vertebrates and currently contains 106,947 taxonomic terms, 22 taxonomic ranks, 104,736 synonyms, and 162,400 cross-references to other taxonomic resources. Key challenges in constructing the VTO included (1) extracting and merging names, synonyms, and identifiers from heterogeneous sources; (2) structuring hierarchies of terms based on evolutionary relationships and the principle of monophyly; and (3) automating this process as much as possible to accommodate updates in source taxonomies.
The VTO is the primary source of taxonomic information used by the Phenoscape Knowledgebase (http://phenoscape.org/), which integrates genetic and evolutionary phenotype data across both model and non-model vertebrates. The VTO is useful for inferring phenotypic changes on the vertebrate tree of life, which enables queries for candidate genes for various episodes in vertebrate evolution.
生物分类的层次体系是生物多样性数据语义整合的前提条件。理想情况下,应该有一个单一、全面、权威的分类体系,涵盖已灭绝和现存的分类单元、同义词和常用名信息,以及反映我们目前对系统发育关系理解的单系超特定分类单元。
作为开发此类资源的第一步,并为实现跨脊椎动物的表型数据大规模整合,我们创建了脊椎动物分类本体(VTO),这是一种通过整合现有分类汇编而语义定义的分类资源,并根据知识共享零(CC0)公共领域豁免协议免费分发。VTO包括现存和已灭绝的脊椎动物,目前包含106,947个分类术语、22个分类等级、104,736个同义词以及与其他分类资源的162,400个交叉引用。构建VTO的主要挑战包括:(1)从异构源提取和合并名称、同义词及标识符;(2)根据进化关系和单系原则构建术语层次结构;(3)尽可能自动化此过程以适应源分类法的更新。
VTO是Phenoscape知识库(http://phenoscape.org/)使用的分类信息的主要来源,该知识库整合了模型和非模型脊椎动物的遗传和进化表型数据。VTO有助于推断脊椎动物生命树上的表型变化,从而能够查询脊椎动物进化过程中各个阶段的候选基因。