Suppr超能文献

改进VCF标准化以实现准确的VCF比较。

Improved VCF normalization for accurate VCF comparison.

作者信息

Bayat Arash, Gaëta Bruno, Ignjatovic Aleksandar, Parameswaran Sri

出版信息

Bioinformatics. 2017 Apr 1;33(7):964-970. doi: 10.1093/bioinformatics/btw748.

Abstract

MOTIVATION

The Variant Call Format (VCF) is widely used to store data about genetic variation. Variant calling workflows detect potential variants in large numbers of short sequence reads generated by DNA sequencing and report them in VCF format. To evaluate the accuracy of variant callers, it is critical to correctly compare their output against a reference VCF file containing a gold standard set of variants. However, comparing VCF files is a complicated task as an individual genomic variant can be represented in several different ways and is therefore not necessarily reported in a unique way by different software.

RESULTS

We introduce a VCF normalization method called Best Alignment Normalisation (BAN) that results in more accurate VCF file comparison. BAN applies all the variations in a VCF file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference genome. Since the purpose of BAN is to get an accurate result at the time of VCF comparison, we define a better normalization method as the one resulting in less disagreement between the outputs of different VCF comparators.

AVAILABILITY AND IMPLEMENTATION

The BAN Linux bash script along with required software are publicly available on https://sites.google.com/site/banadf16.

CONTACT

A.Bayat@unsw.edu.au.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

变异调用格式(VCF)被广泛用于存储有关基因变异的数据。变异调用工作流程会在DNA测序产生的大量短序列读数中检测潜在变异,并以VCF格式报告这些变异。为了评估变异调用程序的准确性,将其输出与包含一组黄金标准变异的参考VCF文件进行正确比较至关重要。然而,比较VCF文件是一项复杂的任务,因为单个基因组变异可以用几种不同的方式表示,因此不同软件不一定以唯一的方式报告。

结果

我们引入了一种称为最佳比对归一化(BAN)的VCF归一化方法,该方法可实现更准确的VCF文件比较。BAN将VCF文件中的所有变异应用于参考基因组以创建样本基因组,然后通过将此样本基因组与参考基因组重新比对来回溯变异。由于BAN的目的是在VCF比较时获得准确的结果,因此我们将一种更好的归一化方法定义为在不同VCF比较器的输出之间产生较少不一致的方法。

可用性与实现

BAN Linux bash脚本以及所需软件可在https://sites.google.com/site/banadf16上公开获取。

联系方式

A.Bayat@unsw.edu.au

补充信息

补充数据可在《生物信息学》在线版上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验