Suppr超能文献

VarGenius 执行队列级别的 DNA 测序变异调用和注释,并允许通过 PostgreSQL 数据库管理产生的数据。

VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database.

机构信息

Telethon Institute for Genetics and Medicine, Viale Campi Flegrei, 34, 80078, Pozzuoli (Naples), Italy.

Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, Istituto di Ricovero e Cura a Carattere Scientifico, Rome, Italy.

出版信息

BMC Bioinformatics. 2018 Dec 12;19(1):477. doi: 10.1186/s12859-018-2532-4.

Abstract

BACKGROUND

Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing.

RESULTS

Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the "joint analysis" of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h.

CONCLUSIONS

We developed VarGenius, a "master" tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius.

摘要

背景

靶向重测序已成为鉴定孟德尔疾病致病突变的最常用且最具成本效益的方法,无论是用于诊断还是研究目的。由于技术的快速进步,NGS 实验室正在扩展其能力以满足不断增加的分析需求。有几个开源工具可用于构建通用的变异调用管道,但仍缺少能够同时执行多个分析、组织和分类样本的工具。

结果

在这里,我们描述了 VarGenius,这是一个基于 Linux 的命令行软件,能够使用并行计算执行多个靶向重测序数据的可定制分析管道。VarGenius 提供了一个数据库来存储分析的输出(调用质量统计信息、变体注释、内部等位变体频率)和样本信息(个人数据、基因型、表型)。VarGenius 还可以使用单个命令对数百个样本进行“联合分析”,大大减少了分析的配置和执行时间。VarGenius 执行基因组分析工具包 (GATK) 最佳实践 (GBP) 的标准Germinal 变异调用管道,使用 Annovar 注释变体,并通过网页显示结果生成用户友好的输出。VarGenius 已在具有 52 台机器(每台机器具有 120GB RAM)的并行计算集群上进行了测试。在此配置下,一个 50M 的全外显子组测序 (WES) 分析(用于一个家庭)大约在 7 小时内完成(三或四口之家);30 个 WES 的联合分析大约在 24 小时内完成,34 个来自 1M 面板的单样本的并行分析大约在 2 小时内完成。

结论

我们开发了 VarGenius,这是一种“主”工具,可满足日益增长的异构 NGS 分析需求,并为下游分析提供最大的灵活性。它为基于队列的分析铺平了道路,而不是基于单例的分析。患者和变体信息存储在数据库中,任何输出文件都可以通过编程访问。具有基本 Linux 技能的生物医学研究人员可以将 VarGenius 用于常规分析,为计算生物学家提供了额外的灵活性,以开发自己的算法来比较和分析数据。该软件可在以下网址免费获取:https://github.com/frankMusacchia/VarGenius。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6bcc/6291943/30bca69ca99b/12859_2018_2532_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验