Dempsey William, Foster Ian, Fraser Scott, Kesselman Carl
University of Southern California.
University of Chicago and Argonne National Laboratory.
Harv Data Sci Rev. 2022 Summer;4(3). doi: 10.1162/99608f92.44d21b86. Epub 2022 Jul 28.
The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible. Thus, we propose an approach to research informatics in which FAIR principles are applied , from the inception of a research project and , to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice.
广泛共享研究数据被普遍视为对科学的速度、质量、可获取性和完整性至关重要。尽管鼓励数据共享的努力不断增加,但共享数据的质量和数据重用的频率仍然低得令人沮丧。我们在此认为,造成这种不幸状况的一个重要原因是,在准备发表成果时,将研究结果整理成可查找、可访问、可互操作和可重用(FAIR)的形式这一工作,往往被推迟到研究项目结束时——到那时,关键细节已无法获取。因此,我们提出一种研究信息学方法,即在研究项目开始时就应用FAIR原则,并将其应用于通过实验或计算产生的每一个数据资产。我们认为,通过采用简单的工具,如轻量级标识符(以确保每个数据资产都可查找)、打包方法(以促进对数据内容的理解)、数据访问方法以及元数据组织和结构化工具(以支持模式开发和演进),这项看似具有挑战性的任务可以变得可行。我们用实验神经科学的一个例子来说明这些方法在实际中是如何发挥作用的。