Granata Ilaria, Sangiovanni Mara, Maiorano Francesco, Miele Marco, Guarracino Mario Rosario
High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy.
BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):376. doi: 10.1186/s12859-016-1197-0.
One of the most challenging issue in the variant calling process is handling the resulting data, and filtering the genes retaining only the ones strictly related to the topic of interest. Several tools permit to gather annotations at different levels of complexity for the detected genes and to group them according to the pathways and/or processes they belong to. However, it might be a time consuming and frustrating task. This is partly due to the size of the file, that might contain many thousands of genes, and to the search of associated variants that requires a gene-by-gene investigation and annotation approach. As a consequence, the initial gene list is often reduced exploiting the knowledge of variants effect, novelty and genotype, with the potential risk of losing meaningful pieces of information.
Here we present Var2GO, a new web-based tool to support the annotation and filtering of variants and genes coming from variant calling of high-throughput sequencing data. Var2GO permits to upload either the unprocessed Variant Calling Format file or a table containing the annotated variants. The raw data undergo a preliminary step of variants annotation, using the SnpEff tool, and are converted to a table format. The table is then uploaded into an on the fly generated database. Genes associated to the variants are automatically annotated with the corresponding Gene Ontology terms covering the three GO domains. Using the web interface it is then possible to filter and extract, from the whole list, genes having annotations in the domain of interest, by simply specifying filtering parameters and one or more keywords. The relevance of this tool is demonstrated on exome sequencing data.
Var2GO is a novel tool that implements a topic-based approach, expressly designed to help biologists in narrowing the search of relevant genes coming from variant calling analysis. Its main purpose is to support non-bioinformaticians in handling and processing raw variant calling data through an intuitive web interface. Furthermore, Var2GO offers a complete pipeline that, starting from the raw VCF file, allows to annotate both variants and associated genes and supports the extraction of relevant biological knowledge.
变异检测过程中最具挑战性的问题之一是处理所得数据,并筛选基因,只保留与感兴趣主题严格相关的基因。有几种工具可以为检测到的基因收集不同复杂程度的注释,并根据它们所属的途径和/或过程对其进行分组。然而,这可能是一项耗时且令人沮丧的任务。部分原因在于文件大小,它可能包含数千个基因,还在于搜索相关变异需要逐个基因的调查和注释方法。因此,最初的基因列表常常会利用变异效应、新颖性和基因型的知识进行缩减,存在丢失有意义信息片段的潜在风险。
在此,我们展示了Var2GO,这是一种基于网络的新工具,用于支持对来自高通量测序数据变异检测的变异和基因进行注释与筛选。Var2GO允许上传未处理的变异调用格式文件或包含已注释变异的表格。原始数据使用SnpEff工具进行变异注释的初步步骤,并转换为表格格式。然后将该表格上传到动态生成的数据库中。与变异相关的基因会自动用涵盖三个基因本体论(GO)领域的相应术语进行注释。使用网络界面,只需指定筛选参数和一个或多个关键词,就可以从整个列表中筛选并提取在感兴趣领域有注释的基因。该工具在全外显子组测序数据上的相关性得到了证明。
Var2GO是一种新颖的工具,它实现了基于主题的方法,专门设计用于帮助生物学家缩小对来自变异检测分析的相关基因的搜索范围。其主要目的是通过直观的网络界面支持非生物信息学家处理和加工原始变异检测数据。此外,Var2GO提供了一个完整的流程,从原始VCF文件开始,允许对变异和相关基因进行注释,并支持提取相关的生物学知识。