Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6.
Département de science biologiques, Université de Montréal, Montréal, Canada H3C 3J7.
Proc Biol Sci. 2022 May 25;289(1975):20212780. doi: 10.1098/rspb.2021.2780. Epub 2022 May 18.
Many leading journals in ecology and evolution now mandate open data upon publication. Yet, there is very little oversight to ensure the completeness and reusability of archived datasets, and we currently have a poor understanding of the factors associated with high-quality data sharing. We assessed 362 open datasets linked to first- or senior-authored papers published by 100 principal investigators (PIs) in the fields of ecology and evolution over a period of 7 years to identify predictors of data completeness and reusability (data archiving quality). Datasets scored low on these metrics: 56.4% were complete and 45.9% were reusable. Data reusability, but not completeness, was slightly higher for more recently archived datasets and PIs with less seniority. Journal open data policy, PI gender and PI corresponding author status were unrelated to data archiving quality. However, PI identity explained a large proportion of the variance in data completeness (27.8%) and reusability (22.0%), indicating consistent inter-individual differences in data sharing practices by PIs across time and contexts. Several PIs consistently shared data of either high or low archiving quality, but most PIs were inconsistent in how well they shared. One explanation for the high intra-individual variation we observed is that PIs often conduct research through students and postdoctoral researchers, who may be responsible for the data collection, curation and archiving. Levels of data literacy vary among trainees and PIs may not regularly perform quality control over archived files. Our findings suggest that research data management training and culture within a PI's group are likely to be more important determinants of data archiving quality than other factors such as a journal's open data policy. Greater incentives and training for individual researchers at all career stages could improve data sharing practices and enhance data transparency and reusability.
许多生态学和进化领域的顶尖期刊现在都要求在发表时公开数据。然而,几乎没有什么监督措施来确保存档数据集的完整性和可重用性,我们目前也不太了解与高质量数据共享相关的因素。我们评估了 362 个与生态和进化领域 100 名首席研究员(PI)发表的第一作者或资深作者论文相关的开放数据集,以确定数据完整性和可重用性(数据存档质量)的预测因素。这些数据集在这些指标上得分较低:56.4%是完整的,45.9%是可重复使用的。最近存档的数据集以及资历较浅的 PI 的数据可重用性略高,但完整性却略低。期刊开放数据政策、PI 性别和 PI 通讯作者身份与数据存档质量无关。然而,PI 身份解释了数据完整性(27.8%)和可重用性(22.0%)差异的很大一部分,表明 PI 个体在时间和背景下的数据共享实践存在一致的个体间差异。一些 PI 持续分享具有高或低存档质量的数据,但大多数 PI 在数据共享方面的一致性较差。我们观察到的个体内高度变异的一个解释是,PI 通常通过学生和博士后研究员进行研究,他们可能负责数据收集、管理和存档。培训生的数据素养水平各不相同,PI 可能不会定期对存档文件进行质量控制。我们的研究结果表明,PI 团队内的研究数据管理培训和文化可能比其他因素(如期刊的开放数据政策)更能决定数据存档质量。为所有职业阶段的个别研究人员提供更多激励和培训,可以改善数据共享实践,并提高数据透明度和可重用性。