Ammar Ammar, Bonaretti Serena, Winckers Laurent, Quik Joris, Bakker Martine, Maier Dieter, Lynch Iseult, van Rijn Jeaphianne, Willighagen Egon
Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, NL-6200 MD Maastricht, The Netherlands.
Transparent MSK Research, NL-6221 BN Maastricht, The Netherlands.
Nanomaterials (Basel). 2020 Oct 20;10(10):2068. doi: 10.3390/nano10102068.
Data sharing and reuse are crucial to enhance scientific progress and maximize return of investments in science. Although attitudes are increasingly favorable, data reuse remains difficult due to lack of infrastructures, standards, and policies. The FAIR (findable, accessible, interoperable, reusable) principles aim to provide recommendations to increase data reuse. Because of the broad interpretation of the FAIR principles, maturity indicators are necessary to determine the FAIRness of a dataset. In this work, we propose a reproducible computational workflow to assess data FAIRness in the life sciences. Our implementation follows principles and guidelines recommended by the maturity indicator authoring group and integrates concepts from the literature. In addition, we propose a FAIR balloon plot to summarize and compare dataset FAIRness. We evaluated the feasibility of our method on three real use cases where researchers looked for six datasets to answer their scientific questions. We retrieved information from repositories (ArrayExpress, Gene Expression Omnibus, eNanoMapper, caNanoLab, NanoCommons and ChEMBL), a registry of repositories, and a searchable resource (Google Dataset Search) via application program interfaces (API) wherever possible. With our analysis, we found that the six datasets met the majority of the criteria defined by the maturity indicators, and we showed areas where improvements can easily be reached. We suggest that use of standard schema for metadata and the presence of specific attributes in registries of repositories could increase FAIRness of datasets.
数据共享和再利用对于促进科学进步以及最大化科学投资回报至关重要。尽管人们的态度越来越支持,但由于缺乏基础设施、标准和政策,数据再利用仍然困难重重。FAIR(可查找、可访问、可互操作、可再利用)原则旨在提供相关建议以增加数据再利用。由于对FAIR原则的宽泛解释,需要成熟度指标来确定数据集的FAIR程度。在这项工作中,我们提出了一种可重复的计算工作流程来评估生命科学中数据的FAIR程度。我们的实施方案遵循了成熟度指标编写小组推荐的原则和指南,并整合了文献中的概念。此外,我们还提出了一个FAIR气泡图来总结和比较数据集的FAIR程度。我们在三个实际用例上评估了我们方法的可行性,在这些用例中,研究人员寻找六个数据集来回答他们的科学问题。我们尽可能通过应用程序接口(API)从存储库(ArrayExpress、基因表达综合数据库、电子纳米映射器、加拿大纳米实验室、纳米共享项目和化学数据库)、存储库注册表以及可搜索资源(谷歌数据集搜索)中检索信息。通过我们的分析,我们发现这六个数据集符合成熟度指标定义的大多数标准,并且我们指出了可以轻松改进的方面。我们建议使用标准的元数据模式以及在存储库注册表中存在特定属性可以提高数据集的FAIR程度。