Suppr超能文献

用于处理 VCF 变体调用格式的一系列免费软件工具:vcflib、bio-vcf、cyvcf2、hts-nim 和 slivar。

A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.

机构信息

Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America.

Pacific Biosciences, San Diego, California, United States of America.

出版信息

PLoS Comput Biol. 2022 May 31;18(5):e1009123. doi: 10.1371/journal.pcbi.1009123. eCollection 2022 May.

Abstract

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.

摘要

自 2011 年推出以来,变体调用格式(VCF)已被广泛应用于几乎所有人群研究中的 DNA 和 RNA 变体处理,以及体细胞和种系突变研究。VCF 格式可以表示单核苷酸变体、多核苷酸变体、插入和缺失以及简单的结构变体,这些变体被称为并锚定在参考基因组上。在这里,我们展示了超过 125 个有用的、免费的、开源的软件工具和库,我们通过多个 vcflib、bio-vcf、cyvcf2、hts-nim 和 slivar 项目编写并提供这些工具。这些工具可用于比较、过滤、归一化、平滑和注释 VCF,以及输出统计信息、可视化和文件变体的转换。这些工具在关键的生物医学管道中每天都在运行,还有无数的 shell 脚本。我们的工具是更广泛的生物信息学生态系统的一部分,我们强调最佳实践。我们简要讨论了 VCF 的设计、经验教训,以及我们如何通过泛基因组图格式来处理更复杂的变异,这些变异不容易用 VCF 格式表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/5c4bcfdf4dec/pcbi.1009123.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验