• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一项关于研究代码质量和执行情况的大规模研究。

A large-scale study on research code quality and execution.

机构信息

Institute for Quantitative Social Science, Harvard University, Cambridge, MA, USA.

CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology, Chinese Academy of Sciences, Shenyang, China.

出版信息

Sci Data. 2022 Feb 21;9(1):60. doi: 10.1038/s41597-022-01143-6.

DOI:10.1038/s41597-022-01143-6
PMID:35190569
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8861064/
Abstract

This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals' collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.

摘要

本文研究了哈佛大学数据知识库(Harvard Dataverse repository)中公开复制数据集的研究代码的质量和执行情况。研究代码通常由一组科学家创建,并与学术论文一起发布,以促进研究的透明度和可重复性。在这项研究中,我们定义了十个问题,以解决影响研究可重复性和可重用性的各个方面。首先,我们检索并分析了 2000 多个复制数据集,这些数据集包含了 2010 年至 2020 年间发布的超过 9000 个唯一的 R 文件。其次,我们在一个干净的运行时环境中执行这些代码,以评估其易用性和可重用性。我们发现,74%的 R 文件在初始执行时没有错误,但在应用代码清理时,有 56%的文件无法完成,这表明许多错误可以通过良好的编码实践来预防。我们还分析了期刊集合中的复制数据集,并讨论了期刊政策严格程度对代码重新执行率的影响。最后,基于我们的研究结果,我们为研究人员、期刊和知识库提出了一系列代码传播建议。

相似文献

1
A large-scale study on research code quality and execution.一项关于研究代码质量和执行情况的大规模研究。
Sci Data. 2022 Feb 21;9(1):60. doi: 10.1038/s41597-022-01143-6.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Data reuse and the open data citation advantage.数据重用与开放数据引文优势。
PeerJ. 2013 Oct 1;1:e175. doi: 10.7717/peerj.175. eCollection 2013.
4
A survey of researchers' code sharing and code reuse practices, and assessment of interactive notebook prototypes.研究者代码共享和代码复用实践调查,以及交互式笔记本原型评估。
PeerJ. 2022 Aug 22;10:e13933. doi: 10.7717/peerj.13933. eCollection 2022.
5
Evaluation of Transparency and Openness Guidelines in Physical Therapist Journals.物理治疗师期刊透明度和开放性指南评估。
Phys Ther. 2024 Jan 1;104(1). doi: 10.1093/ptj/pzad133.
6
Validating the knowledge bank approach for personalized prediction of survival in acute myeloid leukemia: a reproducibility study.验证知识库方法在急性髓系白血病患者个体化生存预测中的应用:一项可重复性研究。
Hum Genet. 2022 Sep;141(9):1467-1480. doi: 10.1007/s00439-022-02455-8. Epub 2022 Apr 16.
7
An empirical analysis of journal policy effectiveness for computational reproducibility.期刊政策对计算可重复性影响的实证分析。
Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2584-2589. doi: 10.1073/pnas.1708290115. Epub 2018 Mar 12.
8
On code sharing and model documentation of published individual and agent-based models.关于已发表的个体模型和基于主体的模型的代码共享与模型文档。
Environ Model Softw. 2020 Dec;134:104873. doi: 10.1016/j.envsoft.2020.104873. Epub 2020 Sep 16.
9
Care to share? Experimental evidence on code sharing behavior in the social sciences.有人愿意分享吗?社会科学中代码共享行为的实验证据。
PLoS One. 2023 Aug 7;18(8):e0289380. doi: 10.1371/journal.pone.0289380. eCollection 2023.
10
Why don't we share data and code? Perceived barriers and benefits to public archiving practices.为什么我们不共享数据和代码?对公共存档实践的感知障碍和收益。
Proc Biol Sci. 2022 Nov 30;289(1987):20221113. doi: 10.1098/rspb.2022.1113. Epub 2022 Nov 23.

引用本文的文献

1
An open-source workflow for identifying hydrodynamic water quality events in rivers by continuous water quality monitoring and time-series data processing using R and US EPA CANARY.一种通过使用R语言和美国环境保护局的CANARY进行连续水质监测和时间序列数据处理来识别河流中水力水质事件的开源工作流程。
MethodsX. 2025 Jul 27;15:103538. doi: 10.1016/j.mex.2025.103538. eCollection 2025 Dec.
2
Building Portable and Reproducible Cancer Informatics Workflows for Scalable Data Analysis: An RNA Sequencing Tutorial.构建用于可扩展数据分析的便携式和可重现癌症信息学工作流程:RNA测序教程。
Methods Mol Biol. 2025;2932:47-73. doi: 10.1007/978-1-0716-4566-6_2.
3

本文引用的文献

1
Promoting reproducibility with Code Ocean.借助Code Ocean提高可重复性。
Genome Biol. 2021 Feb 19;22(1):65. doi: 10.1186/s13059-021-02299-x.
2
Ten simple rules for documenting scientific software.记录科学软件的十条简单规则。
PLoS Comput Biol. 2018 Dec 20;14(12):e1006561. doi: 10.1371/journal.pcbi.1006561. eCollection 2018 Dec.
3
An empirical analysis of journal policy effectiveness for computational reproducibility.期刊政策对计算可重复性影响的实证分析。
Open science interventions to improve reproducibility and replicability of research: a scoping review.
旨在提高研究可重复性和可复制性的开放科学干预措施:一项范围综述
R Soc Open Sci. 2025 Apr 9;12(4):242057. doi: 10.1098/rsos.242057. eCollection 2025 Apr.
4
Open Science at the generative AI turn: An exploratory analysis of challenges and opportunities.生成式人工智能时代的开放科学:挑战与机遇的探索性分析。
Quant Sci Stud. 2025;6:22-45. doi: 10.1162/qss_a_00337. Epub 2025 Jan 27.
5
The reliability of replications: a study in computational reproductions.复制的可靠性:一项关于计算再现的研究。
R Soc Open Sci. 2025 Mar 19;12(3):241038. doi: 10.1098/rsos.241038. eCollection 2025 Mar.
6
How will we prepare for an uncertain future? The value of open data and code for unborn generations facing climate change.我们将如何为不确定的未来做准备?开放数据和代码对于面临气候变化的后代的价值。
Proc Biol Sci. 2025 Feb;292(2040):20241515. doi: 10.1098/rspb.2024.1515. Epub 2025 Feb 12.
7
Functional R code is rare in species distribution and abundance papers.在物种分布与丰度相关论文中,实用的R代码很少见。
Ecology. 2025 Jan;106(1):e4475. doi: 10.1002/ecy.4475. Epub 2024 Nov 20.
8
Assessing computational reproducibility in Behavior Research Methods.评估行为研究方法中的计算可重复性。
Behav Res Methods. 2024 Dec;56(8):8745-8760. doi: 10.3758/s13428-024-02501-5. Epub 2024 Sep 25.
9
Ten simple rules for training scientists to make better software.培养科学家编写更好软件的十条简单规则。
PLoS Comput Biol. 2024 Sep 12;20(9):e1012410. doi: 10.1371/journal.pcbi.1012410. eCollection 2024 Sep.
10
The changing landscape of text mining: a review of approaches for ecology and evolution.文本挖掘的变化格局:对生态学和进化学方法的综述。
Proc Biol Sci. 2024 Jul;291(2027):20240423. doi: 10.1098/rspb.2024.0423. Epub 2024 Jul 31.
Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2584-2589. doi: 10.1073/pnas.1708290115. Epub 2018 Mar 12.
4
If these data could talk.如果这些数据会说话。
Sci Data. 2017 Sep 5;4:170114. doi: 10.1038/sdata.2017.114.
5
Four simple recommendations to encourage best practices in research software.鼓励研究软件最佳实践的四条简单建议。
F1000Res. 2017 Jun 13;6. doi: 10.12688/f1000research.11407.1. eCollection 2017.
6
1,500 scientists lift the lid on reproducibility.1500名科学家揭开了可重复性的盖子。
Nature. 2016 May 26;533(7604):452-4. doi: 10.1038/533452a.
7
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
8
Ten simple rules for reproducible computational research.可重复计算研究的十条简单规则。
PLoS Comput Biol. 2013 Oct;9(10):e1003285. doi: 10.1371/journal.pcbi.1003285. Epub 2013 Oct 24.
9
Mandated data archiving greatly improves access to research data.强制数据归档大大提高了研究数据的可访问性。
FASEB J. 2013 Apr;27(4):1304-8. doi: 10.1096/fj.12-218164. Epub 2013 Jan 3.