Reer Aaron, Wiebe Andreas, Wang Xu, Rieger Jochem W
Applied Neurocognitive Psychology Lab, Institute for Medicine and Healthcare, Department of Psychology, Oldenburg University, Oldenburg, Germany.
Chair for Intellectual Property and Information Law, Göttingen University, Göttingen, Germany.
Front Genet. 2023 Mar 13;14:1086802. doi: 10.3389/fgene.2023.1086802. eCollection 2023.
Modern AI supported research holds many promises for basic and applied science. However, the application of AI methods is often limited because most labs cannot, on their own, acquire large and diverse datasets, which are best for training these methods. Data sharing and open science initiatives promise some relief to the problem, but only if the data are provided in a usable way. The FAIR principles state very general requirements for useful data sharing: they should be findable, accessible, interoperable, and reusable. This article will focus on two challenges to implement the FAIR framework for human neuroscience data. On the one hand, human data can fall under special legal protection. The legal frameworks regulating how and what data can be openly shared differ greatly across countries which can complicate data sharing or even discourage researchers from doing so. Moreover, openly accessible data require standardization of data and metadata organization and annotation in order to become interpretable and useful. This article briefly introduces open neuroscience initiatives that support the implementation of the FAIR principles. It then reviews legal frameworks, their consequences for accessibility of human neuroscientific data and some ethical implications. We hope this comparison of legal jurisdictions helps to elucidate that some alleged obstacles for data sharing only require an adaptation of procedures but help to protect the privacy of our most generous donors to research … our study participants. Finally, it elaborates on the problem of missing standards for metadata annotation and introduces initiatives that aim at developing tools to make neuroscientific data acquisition and analysis pipelines FAIR by design. While the paper focuses on making human neuroscience data useful for data-intensive AI the general considerations hold for other fields where large amounts of openly available human data would be helpful.
现代人工智能支持的研究为基础科学和应用科学带来了诸多前景。然而,人工智能方法的应用往往受到限制,因为大多数实验室自身无法获取大量且多样的数据集,而这些数据集最适合训练这些方法。数据共享和开放科学倡议有望缓解这一问题,但前提是数据要以可用的方式提供。公平原则对有用的数据共享提出了非常一般的要求:数据应是可查找的、可访问的、可互操作的和可重复使用的。本文将聚焦于为人类神经科学数据实施公平框架的两个挑战。一方面,人类数据可能受到特殊法律保护。各国规范数据公开共享方式和内容的法律框架差异很大,这可能使数据共享变得复杂,甚至阻碍研究人员这样做。此外,可公开访问的数据需要数据以及元数据组织和注释的标准化,以便变得可解释和有用。本文简要介绍了支持公平原则实施的开放神经科学倡议。然后回顾法律框架、它们对人类神经科学数据可访问性的影响以及一些伦理含义。我们希望这种法律管辖区的比较有助于阐明,一些所谓的数据共享障碍只需要调整程序,但有助于保护我们最慷慨的研究捐赠者……我们的研究参与者的隐私。最后,详细阐述元数据注释缺失标准的问题,并介绍旨在开发工具以使神经科学数据采集和分析管道在设计上符合公平原则的倡议。虽然本文侧重于使人类神经科学数据对数据密集型人工智能有用,但一般考虑因素也适用于其他大量公开可用人类数据会有帮助的领域。