Rodríguez-Iglesias Alejandro, Rodríguez-González Alejandro, Irvine Alistair G, Sesma Ane, Urban Martin, Hammond-Kosack Kim E, Wilkinson Mark D
Center for Plant Biotechnology and Genomics, Universidad Politécnica de Madrid Madrid, Spain.
ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid Madrid, Spain.
Front Plant Sci. 2016 May 12;7:641. doi: 10.3389/fpls.2016.00641. eCollection 2016.
Pathogen-Host interaction data is core to our understanding of disease processes and their molecular/genetic bases. Facile access to such core data is particularly important for the plant sciences, where individual genetic and phenotypic observations have the added complexity of being dispersed over a wide diversity of plant species vs. the relatively fewer host species of interest to biomedical researchers. Recently, an international initiative interested in scholarly data publishing proposed that all scientific data should be "FAIR"-Findable, Accessible, Interoperable, and Reusable. In this work, we describe the process of migrating a database of notable relevance to the plant sciences-the Pathogen-Host Interaction Database (PHI-base)-to a form that conforms to each of the FAIR Principles. We discuss the technical and architectural decisions, and the migration pathway, including observations of the difficulty and/or fidelity of each step. We examine how multiple FAIR principles can be addressed simultaneously through careful design decisions, including making data FAIR for both humans and machines with minimal duplication of effort. We note how FAIR data publishing involves more than data reformatting, requiring features beyond those exhibited by most life science Semantic Web or Linked Data resources. We explore the value-added by completing this FAIR data transformation, and then test the result through integrative questions that could not easily be asked over traditional Web-based data resources. Finally, we demonstrate the utility of providing explicit and reliable access to provenance information, which we argue enhances citation rates by encouraging and facilitating transparent scholarly reuse of these valuable data holdings.
病原体-宿主相互作用数据是我们理解疾病过程及其分子/遗传基础的核心。轻松获取此类核心数据对植物科学尤为重要,因为与生物医学研究人员感兴趣的宿主物种相对较少相比,个体遗传和表型观察在众多植物物种中更为分散。最近,一个关注学术数据发布的国际倡议提出,所有科学数据都应“FAIR”——即可查找、可访问、可互操作和可重用。在这项工作中,我们描述了将一个与植物科学显著相关的数据库——病原体-宿主相互作用数据库(PHI-base)迁移到符合每个FAIR原则的形式的过程。我们讨论了技术和架构决策以及迁移路径,包括对每个步骤的难度和/或保真度的观察。我们研究了如何通过精心的设计决策同时满足多个FAIR原则,包括以最小的工作量使数据对人类和机器都具有FAIR性。我们注意到FAIR数据发布不仅仅涉及数据重新格式化,还需要超越大多数生命科学语义网或关联数据资源所具备的功能。我们探讨了完成这种FAIR数据转换所带来的附加值,然后通过基于传统网络数据资源难以提出的综合问题来测试结果。最后,我们展示了提供明确且可靠的出处信息的效用,我们认为这通过鼓励和促进对这些宝贵数据资产的透明学术重用提高了引用率。