NaviSE：整合表观基因组学信号代数的超级增强子导航器

NaviSE: superenhancer navigator integrating epigenomics signal algebra.

作者信息

Ascensión Alex M, Arrospide-Elgarresta Mikel, Izeta Ander, Araúzo-Bravo Marcos J

机构信息

Computational Biology and Systems Biomedicine, Biodonostia Health Research Institute, San Sebastián, 20014, Spain.

Tissue Engineering Laboratory, Bioengineering Area, Biodonostia Health Research Institute, San Sebastián, 20014, Spain.

出版信息

BMC Bioinformatics. 2017 Jun 6;18(1):296. doi: 10.1186/s12859-017-1698-5.

DOI:10.1186/s12859-017-1698-5

PMID:28587674

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5461685/

Abstract

BACKGROUND

Superenhancers are crucial structural genomic elements determining cell fate, and they are also involved in the determination of several diseases, such as cancer or neurodegeneration. Although there are pipelines which use independent pieces of software to predict the presence of superenhancers from genome-wide chromatin marks or DNA-interaction protein binding sites, there is not yet an integrated software tool that processes automatically algebra combinations of raw data sequencing into a comprehensive final annotated report of predicted superenhancers.

RESULTS

We have developed NaviSE, a user-friendly streamlined tool which performs a fully-automated parallel processing of genome-wide epigenomics data from sequencing files into a final report, built with a comprehensive set of annotated files that are navigated through a graphic user interface dynamically generated by NaviSE. NaviSE also implements an 'epigenomics signal algebra' that allows the combination of multiple activation and repression epigenomics signals. NaviSE provides an interactive chromosomal landscaping of the locations of superenhancers, which can be navigated to obtain annotated information about superenhancer signal profile, associated genes, gene ontology enrichment analysis, motifs of transcription factor binding sites enriched in superenhancers, graphs of the metrics evaluating the superenhancers quality, protein-protein interaction networks and enriched metabolic pathways among other features. We have parallelised the most time-consuming tasks achieving a reduction up to 30% for a 15 CPUs machine. We have optimized the default parameters of NaviSE to facilitate its use. NaviSE allows different entry levels of data processing, from sra-fastq files to bed files; and unifies the processing of multiple replicates. NaviSE outperforms the more time-consuming processes required in a non-integrated pipeline. Alongside its high performance, NaviSE is able to provide biological insights, predicting cell type specific markers, such as SOX2 and ZIC3 in embryonic stem cells, CDK5R1 and REST in neurons and CD86 and TLR2 in monocytes.

CONCLUSIONS

NaviSE is a user-friendly streamlined solution for superenhancer analysis, annotation and navigation, requiring only basic computer and next generation sequencing knowledge. NaviSE binaries and documentation are available at: https://sourceforge.net/projects/navise-superenhancer/ .

摘要

背景

超级增强子是决定细胞命运的关键结构基因组元件，也参与多种疾病的发生，如癌症或神经退行性疾病。虽然有一些流程使用独立的软件从全基因组染色质标记或DNA相互作用蛋白结合位点预测超级增强子的存在，但目前还没有一个集成的软件工具能够自动处理原始数据测序的代数组合，生成一份关于预测超级增强子的全面最终注释报告。

结果

我们开发了NaviSE，这是一个用户友好的简化工具，它能对来自测序文件的全基因组表观基因组数据进行全自动并行处理，生成最终报告，并构建了一套全面的注释文件，可通过NaviSE动态生成的图形用户界面进行浏览。NaviSE还实现了一种“表观基因组信号代数”，允许组合多个激活和抑制表观基因组信号。NaviSE提供了超级增强子位置的交互式染色体景观图，可通过浏览获取有关超级增强子信号谱、相关基因、基因本体富集分析、超级增强子中富集的转录因子结合位点基序、评估超级增强子质量的指标图、蛋白质-蛋白质相互作用网络和富集代谢途径等信息。我们对最耗时的任务进行了并行化处理，对于一台15个CPU的机器，处理时间最多可减少30%。我们优化了NaviSE的默认参数，以方便使用。NaviSE允许不同的数据处理输入级别，从sra-fastq文件到bed文件；并统一处理多个重复样本。NaviSE优于非集成流程中更耗时的过程。除了高性能外，NaviSE还能够提供生物学见解，预测细胞类型特异性标记，如胚胎干细胞中的SOX2和ZIC3、神经元中的CDK5R1和REST以及单核细胞中的CD86和TLR2。