Sheoran Anushka, Fond Kenneth A, Davis Lex Maliga, Huie J Russell, Vavrek Romana, Axtman P J, Lemmon Vance, Bixby John L, Visser Ubbo, Gensel John C, Fouad Karim, Ferguson Adam R, Grethe Jeffrey S, Bandrowski Anita, Martone Maryann E, Torres-Espin Abel
Department of Neuroscience, University of California San Diego, San Diego, CA, USA.
Department of Neurological Surgery, Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California San Francisco, San Francisco, CA, USA.
Exp Neurol. 2025 Mar;385:115100. doi: 10.1016/j.expneurol.2024.115100. Epub 2024 Dec 7.
Data interoperability is crucial for effectively combining data for scientific inquiry. To facilitate interoperability, data standards such as a common definition of variables are often developed. The Open Data Commons for Spinal Cord Injury (odc-sci.org) has established an initial set of community-based data elements (CoDEs)-a minimal set of variables for sharing-to promote data interoperability in SCI research, aligning with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. We sought to understand the use of CoDEs by the SCI community to inform current standards adherence and future standards development. We systematically analyzed 39 public datasets in relation to 17 required CoDEs and found variations between reported data and the structure specified by the CoDEs. Overall, we found that the enforcement of data standards improved reporting rates of CoDEs variables. Notably, different variables were found to require different levels of curation to ensure semantic equivalence among datasets. We also uncovered specific reporting habits of researchers such as formatting and naming patterns. A need for different data standards based on the nature of the study (e.g., human study, derivative study) was realized alongside a detailed list of issues that should be addressed when implementing such standards. Among the various approaches to developing data standards, ODC-SCI adopted a semi-formal approach by creating standards that are easy to adopt by the user. Our data-driven evaluation of actual reporting behavior shows that this flexibility can lead to subsequent problems in harmonization. This study serves as a baseline analysis of reporting behaviors for shaping and facilitating data standards.
数据互操作性对于有效地整合数据以进行科学探究至关重要。为促进互操作性,通常会制定数据标准,如变量的通用定义。脊髓损伤开放数据共享平台(odc-sci.org)已经建立了一套初始的基于社区的数据元素(CoDEs)——一组用于共享的最小变量集——以促进脊髓损伤研究中的数据互操作性,符合FAIR(可查找、可访问、可互操作和可重用)数据原则。我们试图了解脊髓损伤社区对CoDEs的使用情况,以为当前的标准遵循情况和未来的标准制定提供参考。我们系统地分析了39个公共数据集与17个必需的CoDEs的关系,发现报告的数据与CoDEs规定的结构之间存在差异。总体而言,我们发现数据标准的执行提高了CoDEs变量的报告率。值得注意的是,发现不同的变量需要不同程度的整理以确保数据集之间的语义等效性。我们还发现了研究人员的特定报告习惯,如格式和命名模式。认识到需要根据研究性质(如人体研究、衍生研究)制定不同的数据标准,同时还列出了实施此类标准时应解决的详细问题清单。在制定数据标准的各种方法中,ODC-SCI采用了一种半正式方法,通过创建用户易于采用的标准。我们对实际报告行为的数据驱动评估表明,这种灵活性可能会导致后续协调方面的问题。本研究作为对报告行为的基线分析,有助于塑造和促进数据标准。