Merkys Andrius, Mounet Nicolas, Cepellotti Andrea, Marzari Nicola, Gražulis Saulius, Pizzi Giovanni
Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.
Institute of Biotechnology, Vilnius University, Saulėtekio al. 7, 10257, Vilnius, Lithuania.
J Cheminform. 2017 Nov 14;9(1):56. doi: 10.1186/s13321-017-0242-y.
In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and the TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes.
为了使计算科学研究的结果可查找、可访问、可互操作且可重复使用,有必要用标准化的元数据对其进行修饰。然而,存在一些技术和实际挑战,使得这一过程在实践中难以实现。本文提出了一种协议,用于为晶体结构标记其计算属性,而无需人工干预来整理数据。该协议利用了AiiDA(一个用于管理和自动化科学计算工作流程的开源平台)以及TCOD(一个使用定义明确且详尽的本体存储计算材料属性的开放获取数据库)的功能。基于这些,将计算数据存入TCOD数据库的完整过程实现了自动化。所有相关元数据均从AiiDA在管理计算时自动跟踪和存储的完整溯源信息中提取。这样的协议还能实现计算材料科学领域科学数据的可重复性。作为概念验证,AiiDA - TCOD接口用于存入170个理论结构及其计算属性以及它们的完整溯源图,其中包含超过4600个AiiDA节点。