Suppr超能文献

数据可用性、可重用性和分析可重复性:评估期刊强制开放数据政策的影响

Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal .

作者信息

Hardwicke Tom E, Mathur Maya B, MacDonald Kyle, Nilsonne Gustav, Banks George C, Kidwell Mallory C, Hofelich Mohr Alicia, Clayton Elizabeth, Yoon Erica J, Henry Tessler Michael, Lenne Richie L, Altman Sara, Long Bria, Frank Michael C

机构信息

Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Palo Alto, CA, USA.

Quantitative Sciences Unit, Stanford University, Palo Alto, CA, USA.

出版信息

R Soc Open Sci. 2018 Aug 15;5(8):180448. doi: 10.1098/rsos.180448. eCollection 2018 Aug.

Abstract

Access to data is a critical feature of an efficient, progressive and ultimately self-correcting scientific ecosystem. But the extent to which in-principle benefits of data sharing are realized in practice is unclear. Crucially, it is largely unknown whether published findings can be reproduced by repeating reported analyses upon shared data ('analytic reproducibility'). To investigate this, we conducted an observational evaluation of a mandatory open data policy introduced at the journal . Interrupted time-series analyses indicated a substantial post-policy increase in data available statements (104/417, 25% pre-policy to 136/174, 78% post-policy), although not all data appeared reusable (23/104, 22% pre-policy to 85/136, 62%, post-policy). For 35 of the articles determined to have reusable data, we attempted to reproduce 1324 target values. Ultimately, 64 values could not be reproduced within a 10% margin of error. For 22 articles all target values were reproduced, but 11 of these required author assistance. For 13 articles at least one value could not be reproduced despite author assistance. Importantly, there were no clear indications that original conclusions were seriously impacted. Mandatory open data policies can increase the frequency and quality of data sharing. However, suboptimal data curation, unclear analysis specification and reporting errors can impede analytic reproducibility, undermining the utility of data sharing and the credibility of scientific findings.

摘要

数据获取是高效、进步且最终能自我修正的科学生态系统的一项关键特征。但数据共享在原则上的益处能在实践中实现的程度尚不清楚。至关重要的是,通过对共享数据重复已报告的分析来重现已发表的研究结果(“分析可重复性”)在很大程度上是未知的。为了对此进行调查,我们对该期刊引入的一项强制性开放数据政策进行了观察性评估。中断时间序列分析表明,政策实施后数据可获取声明大幅增加(从政策实施前的104/417,25%增至政策实施后的136/174,78%),尽管并非所有数据似乎都可重复使用(从政策实施前的23/104,22%增至政策实施后的85/136,62%)。对于35篇被确定具有可重复使用数据的文章,我们试图重现1324个目标值。最终,有64个值在10%的误差范围内无法重现。对于22篇文章,所有目标值都被重现了,但其中11篇需要作者协助。对于13篇文章,尽管有作者协助,至少有一个值无法重现。重要的是,没有明确迹象表明原始结论受到严重影响。强制性开放数据政策可以提高数据共享的频率和质量。然而,数据管理欠佳、分析规范不明确和报告错误可能会阻碍分析的可重复性,破坏数据共享的效用和科学发现的可信度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c282/6124055/c5f0c2408100/rsos180448-g1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验