Institut Pasteur, Université Paris Cité, Plate-Forme Technologique Biomics, 75015 Paris, France.
Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 75015 Paris, France.
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i11-i20. doi: 10.1093/bioinformatics/btad227.
The reproducibility crisis has highlighted the importance of improving the way bioinformatics data analyses are implemented, executed, and shared. To address this, various tools such as content versioning systems, workflow management systems, and software environment management systems have been developed. While these tools are becoming more widely used, there is still much work to be done to increase their adoption. The most effective way to ensure reproducibility becomes a standard part of most bioinformatics data analysis projects is to integrate it into the curriculum of bioinformatics Master's programs.
In this article, we present the Reprohackathon, a Master's course that we have been running for the last 3 years at Université Paris-Saclay (France), and that has been attended by a total of 123 students. The course is divided into two parts. The first part includes lessons on the challenges related to reproducibility, content versioning systems, container management, and workflow systems. In the second part, students work on a data analysis project for 3-4 months, reanalyzing data from a previously published study. The Reprohackaton has taught us many valuable lessons, such as the fact that implementing reproducible analyses is a complex and challenging task that requires significant effort. However, providing in-depth teaching of the concepts and the tools during a Master's degree program greatly improves students' understanding and abilities in this area.
可重复性危机强调了改进生物信息学数据分析的实施、执行和共享方式的重要性。为了解决这个问题,已经开发了各种工具,如内容版本控制系统、工作流管理系统和软件环境管理系统。虽然这些工具的使用越来越广泛,但仍有许多工作要做,以提高它们的采用率。确保可重复性成为大多数生物信息学数据分析项目标准部分的最有效方法是将其纳入生物信息学硕士课程的课程中。
在本文中,我们介绍了 Reprohackathon,这是我们在巴黎萨克雷大学(法国)连续 3 年开设的一门硕士课程,共有 123 名学生参加。该课程分为两部分。第一部分包括关于可重复性、内容版本控制系统、容器管理和工作流系统相关挑战的课程。在第二部分,学生们将用 3-4 个月的时间从事数据分析项目,重新分析之前发表的研究的数据。Reprohackathon 给我们带来了许多宝贵的经验,例如实施可重复分析是一项复杂且具有挑战性的任务,需要大量的努力。然而,在硕士学位课程中深入教授概念和工具可以大大提高学生在这方面的理解和能力。