Rioualen Claire, Charbonnier-Khamvongsa Lucie, Collado-Vides Julio, van Helden Jacques
Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France.
Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México.
Curr Protoc Bioinformatics. 2019 Jun;66(1):e72. doi: 10.1002/cpbi.72. Epub 2019 Feb 20.
Next-generation sequencing (NGS) is becoming a routine approach in most domains of the life sciences. To ensure reproducibility of results, there is a crucial need to improve the automation of NGS data processing and enable forthcoming studies relying on big datasets. Although user-friendly interfaces now exist, there remains a strong need for accessible solutions that allow experimental biologists to analyze and explore their results in an autonomous and flexible way. The protocols here describe a modular system that enable a user to compose and fine-tune workflows based on SnakeChunks, a library of rules for the Snakemake workflow engine. They are illustrated using a study combining ChIP-seq and RNA-seq to identify target genes of the global transcription factor FNR in Escherichia coli, which has the advantage that results can be compared with the most up-to-date collection of existing knowledge about transcriptional regulation in this model organism, extracted from the RegulonDB database. © 2019 by John Wiley & Sons, Inc.
下一代测序(NGS)正在成为生命科学大多数领域的常规方法。为确保结果的可重复性,迫切需要提高NGS数据处理的自动化程度,并使未来依赖大数据集的研究成为可能。尽管现在已有用户友好的界面,但仍然非常需要可访问的解决方案,使实验生物学家能够以自主且灵活的方式分析和探索他们的结果。这里的协议描述了一个模块化系统,该系统使用户能够基于SnakeChunks(Snakemake工作流引擎的规则库)来构建和微调工作流。通过一项结合ChIP-seq和RNA-seq以鉴定大肠杆菌中全局转录因子FNR的靶基因的研究对其进行了说明,这样做的优点是可以将结果与从RegulonDB数据库中提取的有关该模式生物转录调控的最新现有知识集合进行比较。© 2019 John Wiley & Sons, Inc.