Ontology Engineering Group, Facultad de Informática, Universidad Politécnica de Madrid, Madrid, Spain.
PLoS One. 2013 Nov 27;8(11):e80278. doi: 10.1371/journal.pone.0080278. eCollection 2013.
How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to "reproducibility maps" that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one's own laboratory.
复制典型计算生物学论文中发现的结果有多容易?读者凭经验或直觉就已经知道,答案是非常困难,甚至根本不可能。在本文中,我们尝试通过为不同类别的用户(从几乎没有专业知识的用户到领域专家)复制先前发表的论文来量化这种难度,并提出可能改进这种情况的方法。通过估计复制原始论文中描述的方法的每个步骤所需的时间,并将它们作为重现原始结果的明确工作流程的一部分,从而实现量化。重现该方法需要花费数月的努力,并且需要使用新版本和新软件,这对重建和验证结果提出了挑战。这种量化导致了“可重复性映射”,揭示了新手研究人员只能重现方法中的少数几个步骤,只有具有该领域先验知识的专家研究人员才能完整地重现该方法。该工作流程本身作为在线资源发布,同时还提供了支持软件和数据。本文最后简要讨论了在成本与收益方面要求可重复性的复杂性,并提出了我们的意见和改善可重复性的指导方针。这不仅对从已发表的论文中复制他人的工作,而且对从自己的实验室中复制工作都有影响。