Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom.
MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, United Kingdom.
PLoS One. 2021 May 13;16(5):e0250887. doi: 10.1371/journal.pone.0250887. eCollection 2021.
To determine whether medRxiv data availability statements describe open or closed data-that is, whether the data used in the study is openly available without restriction-and to examine if this changes on publication based on journal data-sharing policy. Additionally, to examine whether data availability statements are sufficient to capture code availability declarations.
Observational study, following a pre-registered protocol, of preprints posted on the medRxiv repository between 25th June 2019 and 1st May 2020 and their published counterparts.
Distribution of preprinted data availability statements across nine categories, determined by a prespecified classification system. Change in the percentage of data availability statements describing open data between the preprinted and published versions of the same record, stratified by journal sharing policy. Number of code availability declarations reported in the full-text preprint which were not captured in the corresponding data availability statement.
3938 medRxiv preprints with an applicable data availability statement were included in our sample, of which 911 (23.1%) were categorized as describing open data. 379 (9.6%) preprints were subsequently published, and of these published articles, only 155 contained an applicable data availability statement. Similar to the preprint stage, a minority (59 (38.1%)) of these published data availability statements described open data. Of the 151 records eligible for the comparison between preprinted and published stages, 57 (37.7%) were published in journals which mandated open data sharing. Data availability statements more frequently described open data on publication when the journal mandated data sharing (open at preprint: 33.3%, open at publication: 61.4%) compared to when the journal did not mandate data sharing (open at preprint: 20.2%, open at publication: 22.3%).
Requiring that authors submit a data availability statement is a good first step, but is insufficient to ensure data availability. Strict editorial policies that mandate data sharing (where appropriate) as a condition of publication appear to be effective in making research data available. We would strongly encourage all journal editors to examine whether their data availability policies are sufficiently stringent and consistently enforced.
确定 medRxiv 数据可用性声明是否描述了开放数据或封闭数据,即研究中使用的数据是否无限制地公开可用,并根据期刊数据共享政策检查这是否会在发表后发生变化。此外,还检查数据可用性声明是否足以捕捉代码可用性声明。
对 2019 年 6 月 25 日至 2020 年 5 月 1 日期间发布在 medRxiv 存储库的预印本及其已发表对应版本进行的预注册观察性研究。
根据预定义的分类系统,将 9 类预印本数据可用性声明的分布情况进行分类。根据期刊共享政策分层,比较记录的预印本和已发表版本中描述开放数据的数据可用性声明百分比的变化。在全文预印本中报告但未在相应数据可用性声明中捕获的代码可用性声明数量。
我们的样本包括 3938 份具有适用数据可用性声明的 medRxiv 预印本,其中 911 份(23.1%)被归类为描述开放数据。379 份(9.6%)预印本随后被发表,其中只有 155 份包含适用的数据可用性声明。与预印本阶段类似,这些已发表的数据可用性声明中少数(59 份(38.1%))描述了开放数据。在 151 份符合预印本和发表阶段比较的记录中,57 份(37.7%)发表在要求开放数据共享的期刊上。当期刊要求数据共享时(预印本开放:33.3%,发表时开放:61.4%),数据可用性声明在发表时更频繁地描述了开放数据,而当期刊不要求数据共享时(预印本开放:20.2%,发表时开放:22.3%)。
要求作者提交数据可用性声明是一个良好的第一步,但不足以确保数据的可用性。严格的编辑政策,即要求数据共享(在适当的情况下)作为发表的条件,似乎可以有效地公开研究数据。我们强烈鼓励所有期刊编辑检查其数据可用性政策是否足够严格并得到一致执行。