• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

来自自动溯源跟踪的事后元数据:AiiDA与TCOD的集成。

A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD.

作者信息

Merkys Andrius, Mounet Nicolas, Cepellotti Andrea, Marzari Nicola, Gražulis Saulius, Pizzi Giovanni

机构信息

Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.

Institute of Biotechnology, Vilnius University, Saulėtekio al. 7, 10257, Vilnius, Lithuania.

出版信息

J Cheminform. 2017 Nov 14;9(1):56. doi: 10.1186/s13321-017-0242-y.

DOI:10.1186/s13321-017-0242-y
PMID:29138947
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5686034/
Abstract

In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and the TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes.

摘要

为了使计算科学研究的结果可查找、可访问、可互操作且可重复使用,有必要用标准化的元数据对其进行修饰。然而,存在一些技术和实际挑战,使得这一过程在实践中难以实现。本文提出了一种协议,用于为晶体结构标记其计算属性,而无需人工干预来整理数据。该协议利用了AiiDA(一个用于管理和自动化科学计算工作流程的开源平台)以及TCOD(一个使用定义明确且详尽的本体存储计算材料属性的开放获取数据库)的功能。基于这些,将计算数据存入TCOD数据库的完整过程实现了自动化。所有相关元数据均从AiiDA在管理计算时自动跟踪和存储的完整溯源信息中提取。这样的协议还能实现计算材料科学领域科学数据的可重复性。作为概念验证,AiiDA - TCOD接口用于存入170个理论结构及其计算属性以及它们的完整溯源图,其中包含超过4600个AiiDA节点。

相似文献

1
A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD.来自自动溯源跟踪的事后元数据:AiiDA与TCOD的集成。
J Cheminform. 2017 Nov 14;9(1):56. doi: 10.1186/s13321-017-0242-y.
2
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance.AiiDA 1.0,一个可扩展的计算基础设施,用于自动化可重复的工作流程和数据溯源。
Sci Data. 2020 Sep 8;7(1):300. doi: 10.1038/s41597-020-00638-4.
3
A semantic proteomics dashboard (SemPoD) for data management in translational research.用于转化研究数据管理的语义蛋白质组学仪表板(SemPoD)。
BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S20. doi: 10.1186/1752-0509-6-S3-S20. Epub 2012 Dec 17.
4
Provenance Information for Biomedical Data and Workflows: Scoping Review.生物医学数据和工作流程的出处信息:范围综述。
J Med Internet Res. 2024 Aug 23;26:e51297. doi: 10.2196/51297.
5
A collaborative semantic-based provenance management platform for reproducibility.一个用于可重复性的基于协作语义的溯源管理平台。
PeerJ Comput Sci. 2022 Mar 10;8:e921. doi: 10.7717/peerj-cs.921. eCollection 2022.
6
NeuroBridge ontology: computable provenance metadata to give the long tail of neuroimaging data a FAIR chance for secondary use.神经桥本体:可计算的溯源元数据,为神经影像数据的长尾提供二次使用的公平机会。
Front Neuroinform. 2023 Jul 24;17:1216443. doi: 10.3389/fninf.2023.1216443. eCollection 2023.
7
Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review.生物医学数据集和工作流程中出处的方法与标准:范围综述方案
JMIR Res Protoc. 2021 Nov 22;10(11):e31750. doi: 10.2196/31750.
8
Materials Cloud, a platform for open computational science.材料云,一个开放计算科学平台。
Sci Data. 2020 Sep 8;7(1):299. doi: 10.1038/s41597-020-00637-5.
9
Semantic Provenance Graph for Reproducibility of Biomedical Research Studies: Generating and Analyzing Graph Structures from Published Literature.用于生物医学研究可重复性的语义溯源图:从已发表文献中生成和分析图结构。
Stud Health Technol Inform. 2019 Aug 21;264:328-332. doi: 10.3233/SHTI190237.
10
Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv.共享可互操作的工作流溯源:最佳实践综述及其在 CWLProv 中的实际应用。
Gigascience. 2019 Nov 1;8(11). doi: 10.1093/gigascience/giz095.

引用本文的文献

1
Shared metadata for data-centric materials science.面向数据驱动型材料科学的共享元数据。
Sci Data. 2023 Sep 14;10(1):626. doi: 10.1038/s41597-023-02501-8.
2
OPTIMADE, an API for exchanging materials data.OPTIMADE,一种用于交换材料数据的 API。
Sci Data. 2021 Aug 12;8(1):217. doi: 10.1038/s41597-021-00974-z.
3
FAIRSCAPE: a Framework for FAIR and Reproducible Biomedical Analytics.FAIRSCAPE:一个用于实现生物医学分析的 FAIR 和可重复的框架。

本文引用的文献

1
Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds.通过对实验已知化合物进行高通量计算剥离得到的二维材料。
Nat Nanotechnol. 2018 Mar;13(3):246-252. doi: 10.1038/s41565-017-0035-5. Epub 2018 Feb 6.
2
: an error-correcting CIF parser for the Perl language.用于Perl语言的纠错CIF解析器。
J Appl Crystallogr. 2016 Feb 1;49(Pt 1):292-301. doi: 10.1107/S1600576715022396.
3
The Cambridge Structural Database in retrospect and prospect.《剑桥结构数据库的回顾与展望》
Neuroinformatics. 2022 Jan;20(1):187-202. doi: 10.1007/s12021-021-09529-4. Epub 2021 Jul 15.
4
Materials Cloud, a platform for open computational science.材料云,一个开放计算科学平台。
Sci Data. 2020 Sep 8;7(1):299. doi: 10.1038/s41597-020-00637-5.
5
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance.AiiDA 1.0,一个可扩展的计算基础设施,用于自动化可重复的工作流程和数据溯源。
Sci Data. 2020 Sep 8;7(1):300. doi: 10.1038/s41597-020-00638-4.
6
Automated Multiscale Approach To Predict Self-Diffusion from a Potential Energy Field.基于势能场的自扩散预测自动化多尺度方法
J Chem Theory Comput. 2019 Apr 9;15(4):2127-2141. doi: 10.1021/acs.jctc.8b01255. Epub 2019 Mar 13.
Angew Chem Int Ed Engl. 2014 Jan 13;53(3):662-71. doi: 10.1002/anie.201306438. Epub 2014 Jan 2.
4
iotbx.cif: a comprehensive CIF toolbox.iotbx.cif:一个全面的CIF工具箱。
J Appl Crystallogr. 2011 Dec 1;44(Pt 6):1259-1263. doi: 10.1107/S0021889811041161. Epub 2011 Oct 29.
5
Reproducible research in computational science.计算科学中的可重复性研究。
Science. 2011 Dec 2;334(6060):1226-7. doi: 10.1126/science.1213847.
6
Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration.晶体学开放数据库 (COD):一个开放获取的晶体结构集合和全球合作平台。
Nucleic Acids Res. 2012 Jan;40(Database issue):D420-7. doi: 10.1093/nar/gkr900. Epub 2011 Nov 8.
7
QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials.量子 espresso:一个用于材料量子模拟的模块化开源软件项目。
J Phys Condens Matter. 2009 Sep 30;21(39):395502. doi: 10.1088/0953-8984/21/39/395502. Epub 2009 Sep 1.
8
Computer science. Accessible reproducible research.计算机科学。可访问的可重复研究。
Science. 2010 Jan 22;327(5964):415-6. doi: 10.1126/science.1179653.
9
Reproducible research and Biostatistics.可重复性研究与生物统计学。
Biostatistics. 2009 Jul;10(3):405-8. doi: 10.1093/biostatistics/kxp014.
10
Reproducible epidemiologic research.可重复的流行病学研究。
Am J Epidemiol. 2006 May 1;163(9):783-9. doi: 10.1093/aje/kwj093. Epub 2006 Mar 1.