Ram Karthik
Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, CA 94720, USA.
Source Code Biol Med. 2013 Feb 28;8(1):7. doi: 10.1186/1751-0473-8-7.
Reproducibility is the hallmark of good science. Maintaining a high degree of transparency in scientific reporting is essential not just for gaining trust and credibility within the scientific community but also for facilitating the development of new ideas. Sharing data and computer code associated with publications is becoming increasingly common, motivated partly in response to data deposition requirements from journals and mandates from funders. Despite this increase in transparency, it is still difficult to reproduce or build upon the findings of most scientific publications without access to a more complete workflow.
Version control systems (VCS), which have long been used to maintain code repositories in the software industry, are now finding new applications in science. One such open source VCS, Git, provides a lightweight yet robust framework that is ideal for managing the full suite of research outputs such as datasets, statistical code, figures, lab notes, and manuscripts. For individual researchers, Git provides a powerful way to track and compare versions, retrace errors, explore new approaches in a structured manner, while maintaining a full audit trail. For larger collaborative efforts, Git and Git hosting services make it possible for everyone to work asynchronously and merge their contributions at any time, all the while maintaining a complete authorship trail. In this paper I provide an overview of Git along with use-cases that highlight how this tool can be leveraged to make science more reproducible and transparent, foster new collaborations, and support novel uses.
可重复性是优秀科学的标志。在科学报告中保持高度透明不仅对于在科学界获得信任和信誉至关重要,而且对于促进新思想的发展也必不可少。分享与出版物相关的数据和计算机代码变得越来越普遍,部分原因是为了响应期刊的数据存档要求和资助者的指令。尽管透明度有所提高,但如果没有更完整的工作流程,仍然很难重现或基于大多数科学出版物的研究结果。
版本控制系统(VCS)长期以来一直用于软件行业维护代码库,现在在科学领域有了新的应用。一种这样的开源VCS,即Git,提供了一个轻量级但强大的框架,非常适合管理全套研究输出,如数据集、统计代码、图表、实验笔记和手稿。对于个体研究人员来说,Git提供了一种强大的方式来跟踪和比较版本、追溯错误、以结构化方式探索新方法,同时保持完整的审计记录。对于更大规模的合作项目,Git和Git托管服务使每个人都能够异步工作并随时合并他们的贡献,同时始终保持完整的作者记录。在本文中,我将概述Git,并通过一些用例突出展示如何利用这个工具使科学更具可重复性和透明度、促进新的合作以及支持新的应用。