Department Biology II, Ludwig-Maximilians Universität München Martinsried, Germany.
Front Neuroinform. 2011 Aug 30;5:16. doi: 10.3389/fninf.2011.00016. eCollection 2011.
Metadata providing information about the stimulus, data acquisition, and experimental conditions are indispensable for the analysis and management of experimental data within a lab. However, only rarely are metadata available in a structured, comprehensive, and machine-readable form. This poses a severe problem for finding and retrieving data, both in the laboratory and on the various emerging public data bases. Here, we propose a simple format, the "open metaData Markup Language" (odML), for collecting and exchanging metadata in an automated, computer-based fashion. In odML arbitrary metadata information is stored as extended key-value pairs in a hierarchical structure. Central to odML is a clear separation of format and content, i.e., neither keys nor values are defined by the format. This makes odML flexible enough for storing all available metadata instantly without the necessity to submit new keys to an ontology or controlled terminology. Common standard keys can be defined in odML-terminologies for guaranteeing interoperability. We started to define such terminologies for neurophysiological data, but aim at a community driven extension and refinement of the proposed definitions. By customized terminologies that map to these standard terminologies, metadata can be named and organized as required or preferred without softening the standard. Together with the respective libraries provided for common programming languages, the odML format can be integrated into the laboratory workflow, facilitating automated collection of metadata information where it becomes available. The flexibility of odML also encourages a community driven collection and definition of terms used for annotating data in the neurosciences.
元数据提供了关于刺激、数据采集和实验条件的信息,对于实验室内部实验数据的分析和管理是不可或缺的。然而,只有很少的元数据以结构化、全面和机器可读的形式存在。这给实验室内外的各种新兴公共数据库中数据的查找和检索带来了严重的问题。在这里,我们提出了一种简单的格式,即“开放式元数据标记语言”(odML),用于以自动化、基于计算机的方式收集和交换元数据。在 odML 中,任意元数据信息以分层结构中的扩展键值对形式存储。odML 的核心是格式和内容的明确分离,即格式既不定义键也不定义值。这使得 odML 足够灵活,可以即时存储所有可用的元数据,而无需向本体或受控术语提交新的键。常见的标准键可以在 odML 术语中定义,以保证互操作性。我们已经开始为神经生理学数据定义这些术语,但目标是社区驱动的扩展和完善所提出的定义。通过映射到这些标准术语的自定义术语,可以根据需要或偏好命名和组织元数据,而不会削弱标准。odML 格式与为常见编程语言提供的相应库结合使用,可以集成到实验室工作流程中,促进元数据信息的自动收集。odML 的灵活性还鼓励社区驱动的收集和定义用于注释神经科学数据的术语。