Kochev Nikolay, Jeliazkova Nina, Paskaleva Vesselina, Tancheva Gergana, Iliev Luchesar, Ritchie Peter, Jeliazkov Vedrin
Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Assen St, 4000 Plovdiv, Bulgaria.
Ideaconsult Ltd., 4 Angel Kanchev St, 1000 Sofia, Bulgaria.
Nanomaterials (Basel). 2020 Sep 24;10(10):1908. doi: 10.3390/nano10101908.
The field of nanoinformatics is rapidly developing and provides data driven solutions in the area of nanomaterials (NM) safety. Safe by Design approaches are encouraged and promoted through regulatory initiatives and multiple scientific projects. Experimental data is at the core of nanoinformatics processing workflows for risk assessment. The nanosafety data is predominantly recorded in Excel spreadsheet files. Although the spreadsheets are quite convenient for the experimentalists, they also pose great challenges for the consequent processing into databases due to variability of the templates used, specific details provided by each laboratory and the need for proper metadata documentation and formatting. In this paper, we present a workflow to facilitate the conversion of spreadsheets into a FAIR (Findable, Accessible, Interoperable, and Reusable) database, with the pivotal aid of the NMDataParser tool, developed to streamline the mapping of the original file layout into the eNanoMapper semantic data model. The NMDataParser is an open source Java library and application, making use of a JSON configuration to define the mapping. We describe the JSON configuration syntax and the approaches applied for parsing different spreadsheet layouts used by the nanosafety community. Examples of using the NMDataParser tool in nanoinformatics workflows are given. Challenging cases are discussed and appropriate solutions are proposed.
纳米信息学领域正在迅速发展,并在纳米材料(NM)安全领域提供数据驱动的解决方案。通过监管举措和多个科研项目,鼓励并推广“设计即安全”方法。实验数据是纳米信息学风险评估处理工作流程的核心。纳米安全数据主要记录在Excel电子表格文件中。虽然电子表格对实验人员来说非常方便,但由于所用模板的差异性、每个实验室提供的具体细节以及对适当元数据文档和格式的需求,它们在后续处理成数据库时也带来了巨大挑战。在本文中,我们提出了一种工作流程,借助NMDataParser工具的关键辅助,促进电子表格转换为符合FAIR(可查找、可访问、可互操作和可重用)原则的数据库,该工具旨在简化原始文件布局到eNanoMapper语义数据模型的映射。NMDataParser是一个开源Java库和应用程序,利用JSON配置来定义映射。我们描述了JSON配置语法以及用于解析纳米安全社区使用的不同电子表格布局的方法。给出了在纳米信息学工作流程中使用NMDataParser工具的示例。讨论了具有挑战性的案例并提出了适当的解决方案。