Science for Life Laboratory, Uppsala University, Uppsala, SE-750 03, Sweden.
Uppsala Multidisciplinary Center for Advanced Computational Science, Uppsala University, Uppsala, SE-751 05, Sweden.
Gigascience. 2018 May 1;7(5). doi: 10.1093/gigascience/giy028.
Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences.
The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat.
Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
下一代测序(NGS)技术已经改变了生命科学领域,许多研究小组现在都依赖计算机集群来存储和分析大型数据集。这给习惯于为其他科学领域的成熟计算研究提供服务的电子基础设施带来了挑战。我们使用从瑞典乌普萨拉大学 UPPMAX 计算中心自己的集群中收集的数据进行比较和对比,该中心现在大约有 800 个 NGS 和 200 个非 NGS 项目的核心小时使用量相似,比较了 NGS 项目和其他科学领域项目的增长、管理负担和集群使用情况。
自 2010 年以来,NGS 项目的数量迅速增长,新研究小组的加入推动了这一增长。自 2013 年以来,NGS 项目的存储使用量增长更快,现在受到磁盘容量的限制。NGS 用户提交的支持工单数量是每个用户的近两倍,每月为 NGS 项目安装的工具比非 NGS 项目多 11 个。我们开发了使用和效率指标,并表明 NGS 项目的计算作业比非 NGS 项目使用更多的 RAM,核心使用量更具可变性,并且很少跨越多个节点。由于各种原因,NGS 作业对预订资源的使用效率较低。主动监控可以在一定程度上改善这一点。
由于大量缺乏经验的用户和多样化且快速发展的研究领域,在 UPPMAX 托管 NGS 项目会带来很大的管理负担。我们为托管 NGS 研究项目的电子基础设施提供了一组建议。我们提供了我们的存储、作业和效率数据库的匿名版本。