• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于处理 VCF 变体调用格式的一系列免费软件工具:vcflib、bio-vcf、cyvcf2、hts-nim 和 slivar。

A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.

机构信息

Department Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America.

Pacific Biosciences, San Diego, California, United States of America.

出版信息

PLoS Comput Biol. 2022 May 31;18(5):e1009123. doi: 10.1371/journal.pcbi.1009123. eCollection 2022 May.

DOI:10.1371/journal.pcbi.1009123
PMID:35639788
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9286226/
Abstract

Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.

摘要

自 2011 年推出以来,变体调用格式(VCF)已被广泛应用于几乎所有人群研究中的 DNA 和 RNA 变体处理,以及体细胞和种系突变研究。VCF 格式可以表示单核苷酸变体、多核苷酸变体、插入和缺失以及简单的结构变体,这些变体被称为并锚定在参考基因组上。在这里,我们展示了超过 125 个有用的、免费的、开源的软件工具和库,我们通过多个 vcflib、bio-vcf、cyvcf2、hts-nim 和 slivar 项目编写并提供这些工具。这些工具可用于比较、过滤、归一化、平滑和注释 VCF,以及输出统计信息、可视化和文件变体的转换。这些工具在关键的生物医学管道中每天都在运行,还有无数的 shell 脚本。我们的工具是更广泛的生物信息学生态系统的一部分,我们强调最佳实践。我们简要讨论了 VCF 的设计、经验教训,以及我们如何通过泛基因组图格式来处理更复杂的变异,这些变异不容易用 VCF 格式表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/0a0dd05f33d1/pcbi.1009123.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/5c4bcfdf4dec/pcbi.1009123.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/a5ea47265f25/pcbi.1009123.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/bc3202072494/pcbi.1009123.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/2e846ee73721/pcbi.1009123.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/39d6622365e5/pcbi.1009123.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/49f7baaa51e8/pcbi.1009123.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/d15331b7bc39/pcbi.1009123.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/e7b98fa4f088/pcbi.1009123.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/8a3fe69ae2e6/pcbi.1009123.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/8dedfb234389/pcbi.1009123.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/0a0dd05f33d1/pcbi.1009123.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/5c4bcfdf4dec/pcbi.1009123.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/a5ea47265f25/pcbi.1009123.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/bc3202072494/pcbi.1009123.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/2e846ee73721/pcbi.1009123.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/39d6622365e5/pcbi.1009123.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/49f7baaa51e8/pcbi.1009123.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/d15331b7bc39/pcbi.1009123.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/e7b98fa4f088/pcbi.1009123.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/8a3fe69ae2e6/pcbi.1009123.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/8dedfb234389/pcbi.1009123.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a27c/9286226/0a0dd05f33d1/pcbi.1009123.g011.jpg

相似文献

1
A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.用于处理 VCF 变体调用格式的一系列免费软件工具:vcflib、bio-vcf、cyvcf2、hts-nim 和 slivar。
PLoS Comput Biol. 2022 May 31;18(5):e1009123. doi: 10.1371/journal.pcbi.1009123. eCollection 2022 May.
2
VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files.VCF-Miner:用于挖掘存储在VCF文件中的变异和注释的基于图形用户界面的应用程序。
Brief Bioinform. 2016 Mar;17(2):346-51. doi: 10.1093/bib/bbv051. Epub 2015 Jul 25.
3
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files.变异工具工具箱:一种改进的工具,用于分析和操作变异调用格式 (VCF) 文件。
BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-15-S7-S12. Epub 2014 May 28.
4
Improved VCF normalization for accurate VCF comparison.改进VCF标准化以实现准确的VCF比较。
Bioinformatics. 2017 Apr 1;33(7):964-970. doi: 10.1093/bioinformatics/btw748.
5
cyvcf2: fast, flexible variant analysis with Python.cyvcf2:使用Python进行快速、灵活的变异分析。
Bioinformatics. 2017 Jun 15;33(12):1867-1869. doi: 10.1093/bioinformatics/btx057.
6
vcfr: a package to manipulate and visualize variant call format data in R.vcfr:一个用于在R中处理和可视化变异调用格式数据的软件包。
Mol Ecol Resour. 2017 Jan;17(1):44-53. doi: 10.1111/1755-0998.12549. Epub 2016 Jul 12.
7
SNP-SVant: A Computational Workflow to Predict and Annotate Genomic Variants in Organisms Lacking Benchmarked Variants.SNP-SVant:一种在缺乏基准变异的生物中预测和注释基因组变异的计算工作流程。
Curr Protoc. 2024 May;4(5):e1046. doi: 10.1002/cpz1.1046.
8
The variant call format and VCFtools.变异调用格式和 VCFtools。
Bioinformatics. 2011 Aug 1;27(15):2156-8. doi: 10.1093/bioinformatics/btr330. Epub 2011 Jun 7.
9
VCF-Server: A web-based visualization tool for high-throughput variant data mining and management.VCF-Server:一个基于网络的高通量变异数据挖掘和管理的可视化工具。
Mol Genet Genomic Med. 2019 Jul;7(7):e00641. doi: 10.1002/mgg3.641. Epub 2019 May 24.
10
VCF-kit: assorted utilities for the variant call format.VCF工具包:用于变异调用格式的各种实用工具。
Bioinformatics. 2017 May 15;33(10):1581-1582. doi: 10.1093/bioinformatics/btx011.

引用本文的文献

1
A comprehensive water buffalo pangenome reveals extensive structural variation linked to population-specific signatures of selection.一个全面的水牛泛基因组揭示了与群体特异性选择特征相关的广泛结构变异。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf099.
2
Genomic Landscape of High-Altitude Adaptation in East African Mountain Honey Bees ().东非山地蜜蜂高海拔适应性的基因组图谱()
Ecol Evol. 2025 Aug 20;15(8):e71846. doi: 10.1002/ece3.71846. eCollection 2025 Aug.
3
Genomics Reveals Distinct Evolutionary Lineages in Asian Elephants.

本文引用的文献

1
ODGI: understanding pangenome graphs.ODGI:理解泛基因组图谱。
Bioinformatics. 2022 Jun 27;38(13):3319-3326. doi: 10.1093/bioinformatics/btac308.
2
Effective variant filtering and expected candidate variant yield in studies of rare human disease.罕见人类疾病研究中的有效变异筛选及预期候选变异产出
NPJ Genom Med. 2021 Jul 15;6(1):60. doi: 10.1038/s41525-021-00227-3.
3
HTSlib: C library for reading/writing high-throughput sequencing data.HTSlib:用于读取/写入高通量测序数据的 C 库。
基因组学揭示亚洲象不同的进化谱系。
Ecol Evol. 2025 Aug 18;15(8):e72019. doi: 10.1002/ece3.72019. eCollection 2025 Aug.
4
Genomic selection for growth and wood properties in multi-generation hybrid populations of ..多代杂交群体中生长和木材特性的基因组选择
Hortic Res. 2025 Jun 25;12(9):uhaf165. doi: 10.1093/hr/uhaf165. eCollection 2025 Sep.
5
Assessing population allele frequency differences using low-depth sequencing data.使用低深度测序数据评估群体等位基因频率差异。
J R Soc N Z. 2025 May 21;55(6):2677-2688. doi: 10.1080/03036758.2025.2500999. eCollection 2025.
6
A century of anthropogenic perturbations impact genomic signatures of the iconic migratory Atlantic cod.一个世纪的人为干扰影响了标志性洄游大西洋鳕鱼的基因组特征。
Sci Adv. 2025 Aug;11(31):eadp3342. doi: 10.1126/sciadv.adp3342. Epub 2025 Jul 30.
7
The Genomic Basis of the Tristylous Floral Polymorphism: Evidence for a Role of Gene Duplications in a Region of Restricted Recombination.三型花柱花多态性的基因组基础:基因重复在有限重组区域中作用的证据
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf170.
8
Peripheral Budding Following Range Expansion Explains Diversity and Distribution of One-Sided Livebearing Fish.范围扩张后的外周萌芽解释了单侧卵胎生鱼类的多样性和分布。
Mol Ecol. 2025 Aug;34(16):e70023. doi: 10.1111/mec.70023. Epub 2025 Jul 11.
9
New genetic diagnoses for inherited retinal dystrophies by integrating splicing tools into NGS pipelines.通过将剪接工具整合到二代测序流程中实现遗传性视网膜营养不良的新基因诊断
NPJ Genom Med. 2025 Jul 2;10(1):52. doi: 10.1038/s41525-025-00500-9.
10
Evidence of early genomic selection in Holstein Friesian across African and European ecosystems.荷斯坦奶牛在非洲和欧洲生态系统中的早期基因组选择证据。
BMC Genomics. 2025 Jul 1;26(1):615. doi: 10.1186/s12864-025-11828-y.
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab007.
4
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.
5
Sparse Project VCF: efficient encoding of population genotype matrices.稀疏项目 VCF:群体基因型矩阵的有效编码。
Bioinformatics. 2021 Apr 1;36(22-23):5537-5538. doi: 10.1093/bioinformatics/btaa1004.
6
genozip: a fast and efficient compression tool for VCF files.genozip:一种用于 VCF 文件的快速高效压缩工具。
Bioinformatics. 2020 Jul 1;36(13):4091-4092. doi: 10.1093/bioinformatics/btaa290.
7
Scalable Workflows and Reproducible Data Analysis for Genomics.基因组学的可扩展工作流程和可重复数据分析
Methods Mol Biol. 2019;1910:723-745. doi: 10.1007/978-1-4939-9074-0_24.
8
Variation graph toolkit improves read mapping by representing genetic variation in the reference.变异图谱工具包通过表示参考中的遗传变异来提高读映射质量。
Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20.
9
Bioconda: sustainable and comprehensive software distribution for the life sciences.生物conda:面向生命科学的可持续且全面的软件发行平台。
Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7.
10
hts-nim: scripting high-performance genomic analyses.hts-nim:高性能基因组分析脚本编写。
Bioinformatics. 2018 Oct 1;34(19):3387-3389. doi: 10.1093/bioinformatics/bty358.