Suppr超能文献

SnakeLines:一套用于测序读取的集成计算管道。

SnakeLines: integrated set of computational pipelines for sequencing reads.

机构信息

Geneton Ltd., 841 04 Bratislava, Slovakia.

Slovak Centre of Scientific and Technical Information, 811 04 Bratislava, Slovakia.

出版信息

J Integr Bioinform. 2023 Aug 21;20(3). doi: 10.1515/jib-2022-0059. eCollection 2023 Sep 1.

Abstract

With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.

摘要

随着大规模平行测序技术的快速发展,越来越多的实验室正在利用测序 DNA 片段进行基因组分析。然而,测序数据的解释强烈依赖于生物信息学处理,而对于没有计算背景的临床医生和研究人员来说,这往往要求过高。另一个问题是,在具有不一致安装库和生物信息学工具版本的分离计算中心之间,计算分析的可重复性。我们提出了一组称为 SnakeLines 的易于扩展的计算管道,用于处理测序reads;包括映射、组装、变体调用、病毒识别、转录组学和宏基因组学分析。分析的各个步骤,以及方法及其参数,可以在单个配置文件中轻松修改。提供的管道被嵌入虚拟环境中,确保从主机操作系统隔离所需的资源、快速部署以及跨不同基于 Unix 的平台的分析的可重复性。SnakeLines 是一个用于生物信息学分析自动化的强大框架,重点是简单的设置、修改、可扩展性和可重复性。该框架已经在各种研究项目及其应用中得到了常规使用,特别是在斯洛伐克的 SARS-CoV-2 国家监测中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2f9/10757078/9ce3b2bdd368/j_jib-2022-0059_fig_001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验