vcfr:一个用于在R中处理和可视化变异调用格式数据的软件包。
vcfr: a package to manipulate and visualize variant call format data in R.
作者信息
Knaus Brian J, Grünwald Niklaus J
机构信息
Horticultural Crops Research Unit, USDA-ARS, Corvallis, OR, 97330, USA.
Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA.
出版信息
Mol Ecol Resour. 2017 Jan;17(1):44-53. doi: 10.1111/1755-0998.12549. Epub 2016 Jul 12.
Software to call single-nucleotide polymorphisms or related genetic variants has converged on the variant call format (VCF) as the output format of choice. This has created a need for tools to work with VCF files. While an increasing number of software exists to read VCF data, many only extract the genotypes without including the data associated with each genotype that describes its quality. We created the r package vcfr to address this issue. We developed a VCF file exploration tool implemented in the r language because r provides an interactive experience and an environment that is commonly used for genetic data analysis. Functions to read and write VCF files into r as well as functions to extract portions of the data and to plot summary statistics of the data are implemented. vcfr further provides the ability to visualize how various parameterizations of the data affect the results. Additional tools are included to integrate sequence (fasta) and annotation data (GFF) for visualization of genomic regions such as chromosomes. Conversion functions translate data from the vcfr data structure to formats used by other r genetics packages. Computationally intensive functions are implemented in C++ to improve performance. Use of these tools is intended to facilitate VCF data exploration, including intuitive methods for data quality control and easy export to other r packages for further analysis. vcfr thus provides essential, novel tools currently not available in r.
用于调用单核苷酸多态性或相关基因变异的软件已趋向于将变异调用格式(VCF)作为首选的输出格式。这就产生了对处理VCF文件的工具的需求。虽然存在越来越多读取VCF数据的软件,但许多软件仅提取基因型,而不包括与每个基因型相关的描述其质量的数据。我们创建了R包vcfr来解决这个问题。我们开发了一个用R语言实现的VCF文件探索工具,因为R提供了一种交互式体验以及一个常用于基因数据分析的环境。实现了将VCF文件读入和写出到R的函数,以及提取部分数据和绘制数据汇总统计的函数。vcfr还提供了可视化数据的各种参数化如何影响结果的能力。还包括其他工具,用于整合序列(fasta)和注释数据(GFF),以可视化基因组区域,如染色体。转换函数将数据从vcfr数据结构转换为其他R基因包使用的格式。计算密集型函数用C++实现以提高性能。使用这些工具旨在促进VCF数据探索,包括用于数据质量控制的直观方法以及轻松导出到其他R包进行进一步分析。因此,vcfr提供了R目前所没有的重要新颖工具。