Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
Present address: IDbyDNA Inc., San Francisco, CA, USA.
BMC Bioinformatics. 2018 Feb 20;19(1):57. doi: 10.1186/s12859-018-2056-y.
Prioritization of sequence variants for diagnosis and discovery of Mendelian diseases is challenging, especially in large collections of whole genome sequences (WGS). Fast, scalable solutions are needed for discovery research, for clinical applications, and for curation of massive public variant repositories such as dbSNP and gnomAD. In response, we have developed VVP, the VAAST Variant Prioritizer. VVP is ultrafast, scales to even the largest variant repositories and genome collections, and its outputs are designed to simplify clinical interpretation of variants of uncertain significance.
We show that scoring the entire contents of dbSNP (> 155 million variants) requires only 95 min using a machine with 4 cpus and 16 GB of RAM, and that a 60X WGS can be processed in less than 5 min. We also demonstrate that VVP can score variants anywhere in the genome, regardless of type, effect, or location. It does so by integrating sequence conservation, the type of sequence change, allele frequencies, variant burden, and zygosity. Finally, we also show that VVP scores are consistently accurate, and easily interpreted, traits not shared by many commonly used tools such as SIFT and CADD.
VVP provides rapid and scalable means to prioritize any sequence variant, anywhere in the genome, and its scores are designed to facilitate variant interpretation using ACMG and NHS guidelines. These traits make it well suited for operation on very large collections of WGS sequences.
优先考虑序列变异以进行孟德尔疾病的诊断和发现具有挑战性,尤其是在大量全基因组序列 (WGS) 中。需要快速、可扩展的解决方案来进行发现研究、临床应用以及大规模公共变异存储库(如 dbSNP 和 gnomAD)的管理。作为回应,我们开发了 VVP,即 VAAST 变体优先级排序器。VVP 速度极快,可扩展到甚至最大的变体存储库和基因组集合,其输出旨在简化对不确定意义的变体的临床解释。
我们表明,使用具有 4 个 CPU 和 16GB RAM 的机器,仅需 95 分钟即可对 dbSNP(>15500 万个变体)的全部内容进行评分,并且可以在不到 5 分钟的时间内处理 60X WGS。我们还证明,VVP 可以对基因组中的任何位置的变体进行评分,无论其类型、效应或位置如何。它通过整合序列保守性、序列变化类型、等位基因频率、变体负担和二倍体性来实现这一点。最后,我们还表明,VVP 评分始终准确且易于解释,这是许多常用工具(如 SIFT 和 CADD)所不具备的特征。
VVP 提供了快速且可扩展的方法来优先考虑基因组中任何位置的任何序列变异,并且其评分旨在使用 ACMG 和 NHS 指南促进变异解释。这些特征使其非常适合处理非常大的 WGS 序列集合。