Delescluse Matthieu, Franconville Romain, Joucla Sébastien, Lieury Tiffany, Pouzat Christophe
Laboratoire de physiologie cérébrale, CNRS UMR 8118, UFR biomédicale, Université Paris-Descartes, 45 rue des Saints-Péres, 75006 Paris, France.
J Physiol Paris. 2012 May-Aug;106(3-4):159-70. doi: 10.1016/j.jphysparis.2011.09.011. Epub 2011 Oct 4.
Reproducible data analysis is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present. "Everything" covers here: the data, the computer codes and a precise description of how the code was applied to the data. A brief history of this approach is presented first, starting with what economists have been calling replication since the early eighties to end with what is now called reproducible research in computational data analysis oriented fields like statistics and signal processing. Since efficient tools are instrumental for a routine implementation of these approaches, a description of some of the available ones is presented next. A toy example demonstrates then the use of two open source software programs for reproducible data analysis: the "Sweave family" and the org-mode of emacs. The former is bound to R while the latter can be used with R, Matlab, Python and many more "generalist" data processing software. Both solutions can be used with Unix-like, Windows and Mac families of operating systems. It is argued that neuroscientists could communicate much more efficiently their results by adopting the reproducible research paradigm from their lab books all the way to their articles, thesis and books.
可重复数据分析是一种旨在用独立重现其所呈现结果所需的一切内容来补充经典印刷科学文章的方法。这里的“一切”涵盖:数据、计算机代码以及代码如何应用于数据的精确描述。首先介绍这种方法的简要历史,从自八十年代初以来经济学家所称的复制开始,到如今在统计和信号处理等面向计算数据分析的领域中所谓的可重复研究结束。由于高效工具对这些方法的常规实施至关重要,接下来介绍一些可用工具。一个示例展示了用于可重复数据分析的两个开源软件程序的使用:“Sweave家族”和emacs的org模式。前者与R绑定,而后者可与R、Matlab、Python以及更多“通用”数据处理软件一起使用。这两种解决方案都可与类Unix、Windows和Mac操作系统家族一起使用。有人认为,神经科学家通过从实验记录一直到文章、论文和书籍采用可重复研究范式,可以更高效地交流他们的研究结果。