Visentin Luca, Munaron Luca, Ruffinatti Federico Alessandro
Department of Life Sciences and Systems Biology, University of Turin, Turin, 10136, Italy.
F1000Res. 2025 Apr 4;14:88. doi: 10.12688/f1000research.157325.1. eCollection 2025.
Structuring data analysis projects, that is, defining the layout of files and folders needed to analyze data using existing tools and novel code, largely follows personal preferences. Open Science calls for more accessible, transparent and understandable research. We believe that Open Science principles can be applied to the way data analysis projects are structured.
We examine the structure of several data analysis project templates by analyzing project template repositories present in GitHub. Through visualization of the resulting consensus structure, we draw observations regarding how the ecosystem of project structures is shaped, and what salient characteristics it has.
Project templates show little overlap, but many distinct practices can be highlighted. We take them into account with the wider Open Science philosophy to draw a few fundamental Design Principles to guide researchers when designing a project space. We present Kerblam!, a project management tool that can work with such a project structure to expedite data handling, execute workflow managers, and share the resulting workflow and analysis outputs with others.
We hope that, by following these principles and using Kerblam!, the landscape of data analysis projects can become more transparent, understandable, and ultimately useful to the wider community.
构建数据分析项目,即使用现有工具和新代码定义分析数据所需的文件和文件夹布局,很大程度上遵循个人偏好。开放科学要求开展更易于获取、透明且易懂的研究。我们认为开放科学原则可应用于数据分析项目的构建方式。
我们通过分析GitHub上的项目模板库来研究多个数据分析项目模板的结构。通过可视化最终得出的共识结构,我们对项目结构生态系统的形成方式及其显著特征进行观察。
项目模板几乎没有重叠,但许多不同的做法值得关注。我们将它们与更广泛的开放科学理念相结合,得出一些基本设计原则,以指导研究人员设计项目空间。我们展示了Kerblam!,这是一款项目管理工具,它可以与这样的项目结构配合使用,以加快数据处理、执行工作流管理器,并与他人共享最终的工作流和分析输出。
我们希望,通过遵循这些原则并使用Kerblam!,数据分析项目的格局能够变得更加透明、易懂,并最终对更广泛的群体有用。