• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

vcfgl:用于VCF/BCF文件的灵活基因型似然模拟器。

vcfgl: a flexible genotype likelihood simulator for VCF/BCF files.

作者信息

Altinkaya Isin, Nielsen Rasmus, Korneliussen Thorfinn Sand

机构信息

Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen K, 1350, Denmark.

Departments of Integrative Biology and Statistics, University of California, Berkeley, CA, 94720, United States.

出版信息

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf098.

DOI:10.1093/bioinformatics/btaf098
PMID:40045175
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11968323/
Abstract

MOTIVATION

Accurate quantification of genotype uncertainty is pivotal in ensuring the reliability of genetic inferences drawn from NGS data. Genotype uncertainty is typically modeled using Genotype Likelihoods (GLs), which can help propagate measures of statistical uncertainty in base calls to downstream analyses. However, the effects of errors and biases in the estimation of GLs, introduced by biases in the original base call quality scores or the discretization of quality scores, as well as the choice of the GL model, remain under-explored.

RESULTS

We present vcfgl, a versatile tool for simulating genotype likelihoods associated with simulated read data. It offers a framework for researchers to simulate and investigate the uncertainties and biases associated with the quantification of uncertainty, thereby facilitating a deeper understanding of their impacts on downstream analytical methods. Through simulations, we demonstrate the utility of vcfgl in benchmarking GL-based methods. The program can calculate GLs using various widely used genotype likelihood models and can simulate the errors in quality scores using a Beta distribution. It is compatible with modern simulators such as msprime and SLiM, and can output data in pileup, Variant Call Format (VCF)/BCF, and genomic VCF file formats, supporting a wide range of applications. The vcfgl program is freely available as an efficient and user-friendly software written in C/C++.

AVAILABILITY AND IMPLEMENTATION

vcfgl is freely available at https://github.com/isinaltinkaya/vcfgl.

摘要

动机

准确量化基因型不确定性对于确保从NGS数据得出的遗传推断的可靠性至关重要。基因型不确定性通常使用基因型似然性(GLs)进行建模,这有助于将碱基调用中的统计不确定性度量传播到下游分析中。然而,由原始碱基调用质量得分中的偏差或质量得分的离散化以及GL模型的选择所引入的GL估计中的误差和偏差的影响仍未得到充分探索。

结果

我们提出了vcfgl,这是一种用于模拟与模拟读取数据相关的基因型似然性的通用工具。它为研究人员提供了一个框架,用于模拟和研究与不确定性量化相关的不确定性和偏差,从而有助于更深入地了解它们对下游分析方法的影响。通过模拟,我们展示了vcfgl在基于GL的方法基准测试中的效用。该程序可以使用各种广泛使用的基因型似然模型计算GL,并可以使用贝塔分布模拟质量得分中的误差。它与诸如msprime和SLiM等现代模拟器兼容,并且可以以堆积格式、变异调用格式(VCF)/BCF以及基因组VCF文件格式输出数据,支持广泛的应用。vcfgl程序作为一个用C/C++编写的高效且用户友好的软件可免费获得。

可用性和实现

vcfgl可在https://github.com/isinaltinkaya/vcfgl上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e49/11968323/9c4755100a40/btaf098f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e49/11968323/9c4755100a40/btaf098f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e49/11968323/9c4755100a40/btaf098f1.jpg

相似文献

1
vcfgl: a flexible genotype likelihood simulator for VCF/BCF files.vcfgl:用于VCF/BCF文件的灵活基因型似然模拟器。
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf098.
2
Vcfexpress: flexible, rapid user-expressions to filter and format VCFs.Vcfexpress:用于筛选和格式化VCF文件的灵活、快速的用户表达式。
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf097.
3
jackalope: A swift, versatile phylogenomic and high-throughput sequencing simulator.狼兔:一种快速、通用的系统发育基因组学和高通量测序模拟程序。
Mol Ecol Resour. 2020 Jul;20(4):1132-1140. doi: 10.1111/1755-0998.13173. Epub 2020 May 20.
4
cyvcf2: fast, flexible variant analysis with Python.cyvcf2:使用Python进行快速、灵活的变异分析。
Bioinformatics. 2017 Jun 15;33(12):1867-1869. doi: 10.1093/bioinformatics/btx057.
5
Variant graph craft (VGC): a comprehensive tool for analyzing genetic variation and identifying disease-causing variants.变体图工艺(VGC):一种全面的分析遗传变异和识别致病变异的工具。
BMC Bioinformatics. 2024 Sep 3;25(1):288. doi: 10.1186/s12859-024-05875-7.
6
PhredEM: a phred-score-informed genotype-calling approach for next-generation sequencing studies.PhredEM:一种用于下一代测序研究的基于Phred分数的基因型分型方法。
Genet Epidemiol. 2017 Jul;41(5):375-387. doi: 10.1002/gepi.22048. Epub 2017 May 31.
7
VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files.VCF2Dis:一种用于从VCF文件计算成对遗传距离并构建群体系统发育树的超快速高效工具。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf032.
8
vcfr: a package to manipulate and visualize variant call format data in R.vcfr:一个用于在R中处理和可视化变异调用格式数据的软件包。
Mol Ecol Resour. 2017 Jan;17(1):44-53. doi: 10.1111/1755-0998.12549. Epub 2016 Jul 12.
9
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files.变异工具工具箱:一种改进的工具,用于分析和操作变异调用格式 (VCF) 文件。
BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-15-S7-S12. Epub 2014 May 28.
10
IgSimulator: a versatile immunosequencing simulator.IgSimulator:一种通用的免疫测序模拟程序。
Bioinformatics. 2015 Oct 1;31(19):3213-5. doi: 10.1093/bioinformatics/btv326. Epub 2015 May 25.

引用本文的文献

1
Population Genomics of Giant Mice from the Faroe Islands: Hybridization, Colonization, and a Novel Challenge to Identifying Genomic Targets of Selection.法罗群岛巨型小鼠的群体基因组学:杂交、殖民化以及识别选择的基因组靶点面临的新挑战。
Genome Biol Evol. 2025 Jul 30;17(8). doi: 10.1093/gbe/evaf141.

本文引用的文献

1
SLiM 4: Multispecies Eco-Evolutionary Modeling.SLiM 4:多物种生态进化建模。
Am Nat. 2023 May;201(5):E127-E139. doi: 10.1086/723601. Epub 2023 Mar 21.
2
A Genomic Quantitative Study on the Contribution of the Ancestral-State Bases Relative to Derived Bases in the Divergence and Local Adaptation of .一个关于祖先状态碱基相对于衍生碱基在. 的分化和局部适应中的贡献的基因组定量研究。
Genes (Basel). 2023 Mar 29;14(4):821. doi: 10.3390/genes14040821.
3
Estimation of site frequency spectra from low-coverage sequencing data using stochastic EM reduces overfitting, runtime, and memory usage.
使用随机期望最大化(stochastic EM)从低覆盖测序数据中估计位点频率谱可以减少过拟合、运行时间和内存使用。
Genetics. 2022 Nov 30;222(4). doi: 10.1093/genetics/iyac148.
4
distAngsd: Fast and Accurate Inference of Genetic Distances for Next-Generation Sequencing Data.distAngsd:用于下一代测序数据的快速准确的遗传距离推断。
Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac119.
5
Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.从低覆盖高通量测序数据中快速准确地估计多维位点频率谱。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac032.
6
Efficient ancestry and mutation simulation with msprime 1.0.利用 msprime 1.0 进行高效的祖先和突变模拟。
Genetics. 2022 Mar 3;220(3). doi: 10.1093/genetics/iyab229.
7
A beginner's guide to low-coverage whole genome sequencing for population genomics.人群基因组学低覆盖度全基因组测序入门指南。
Mol Ecol. 2021 Dec;30(23):5966-5993. doi: 10.1111/mec.16077. Epub 2021 Aug 31.
8
Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。
F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.
9
Identifying loci under selection via explicit demographic models.通过显式人口模型识别选择下的基因座。
Mol Ecol Resour. 2021 Nov;21(8):2719-2737. doi: 10.1111/1755-0998.13415. Epub 2021 Jun 3.
10
Twelve years of SAMtools and BCFtools.SAMtools 和 BCFtools 十二年。
Gigascience. 2021 Feb 16;10(2). doi: 10.1093/gigascience/giab008.