NCB Naturalis, Leiden, The Netherlands.
Syst Biol. 2012 Jul;61(4):675-89. doi: 10.1093/sysbio/sys025. Epub 2012 Feb 22.
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.
在科学研究中,整合和综合需要对数据的来源、可信度以及用途有共同的理解。要使计算机能够理解这种理解,就需要有交换丰富注释数据的标准。在涉及进化比较分析时,传达可重用数据的挑战尤为突出,因为进化比较分析包括越来越多的数据类型、方法、研究目标和子学科。为了促进进化比较分析中的互操作性,我们提出了 NeXML,这是一种 XML 标准(受当前标准 NEXUS 的启发),支持丰富注释比较数据的交换。NeXML 为分类单元、特征状态矩阵和系统发育树和网络定义了语法。文档可以进行明确的验证。重要的是,任何数据元素都可以使用一种灵活而严格的系统进行任意程度的注释。我们描述了 TreeBASE 和 Phenoscape 项目如何使用 NeXML 满足用户需求,而这些需求无法通过其他可用文件格式来满足。通过依赖 XML 模式定义,NeXML 的设计促进了用于处理、转换和查询文档的软件的开发和部署。NeXML 的实际应用可以通过以下方式得到促进:(1)在线手册,其中包含代码示例和对所有定义的元素和属性的引用;(2)在进化信息学中常用的大多数语言中提供编程工具包;(3)在几个广泛使用的软件应用程序中提供输入输出支持。一个活跃的、开放的、基于社区的开发过程使 NeXML 能够进行未来的修订和扩展。