Dulitzki Coby, Crane Steven Michael, Hardwicke Tom E, Ioannidis John P A
Department of Biology, Stanford University, Stanford, CA, USA.
Stanford Prevention Research Center, Stanford School of Medicine, Stanford, CA, USA.
R Soc Open Sci. 2024 May 15;11(5):240016. doi: 10.1098/rsos.240016. eCollection 2024 May.
Access to scientific data can enable independent reuse and verification; however, most data are not available and become increasingly irrecoverable over time. This study aimed to retrieve and preserve important datasets from 160 of the most highly-cited social science articles published between 2008-2013 and 2015-2018. We asked authors if they would share data in a public repository-the Data Ark-or provide reasons if data could not be shared. Of the 160 articles, data for 117 (73%, 95% CI [67%-80%]) were not available and data for 7 (4%, 95% CI [0%-12%]) were available with restrictions. Data for 36 (22%, 95% CI [16%-30%]) articles were available in unrestricted form: 29 of these datasets were already available and 7 datasets were made available in the Data Ark. Most authors did not respond to our data requests and a minority shared reasons for not sharing, such as legal or ethical constraints. These findings highlight an unresolved need to preserve important scientific datasets and increase their accessibility to the scientific community.
获取科学数据能够实现独立的再利用和验证;然而,大多数数据无法获取,并且随着时间的推移越来越难以恢复。本研究旨在从2008 - 2013年和2015 - 2018年发表的160篇被引用次数最高的社会科学文章中检索并保存重要数据集。我们询问作者是否愿意在公共存储库——数据方舟中共享数据,或者在无法共享数据时提供理由。在这160篇文章中,117篇(73%,95%置信区间[67% - 80%])的数据无法获取,7篇(4%,95%置信区间[0% - 12%])的数据有使用限制。36篇(22%,95%置信区间[16% - 30%])文章的数据以无限制的形式提供:其中29个数据集已经可以获取,7个数据集在数据方舟中提供。大多数作者没有回复我们的数据请求,少数作者分享了不共享的原因,如法律或伦理限制。这些发现凸显了保护重要科学数据集并提高科学界对其可获取性这一未解决的需求。