使用模拟和真实测序数据对 Pool-seq SNP 调用程序的性能进行基准测试。

Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data.

机构信息

Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.

出版信息

Mol Ecol Resour. 2021 May;21(4):1216-1229. doi: 10.1111/1755-0998.13343. Epub 2021 Mar 5.

DOI:10.1111/1755-0998.13343

PMID:33534960

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8251607/

Abstract

Population genomics is a fast-developing discipline with promising applications in a growing number of life sciences fields. Advances in sequencing technologies and bioinformatics tools allow population genomics to exploit genome-wide information to identify the molecular variants underlying traits of interest and the evolutionary forces that modulate these variants through space and time. However, the cost of genomic analyses of multiple populations is still too high to address them through individual genome sequencing. Pooling individuals for sequencing can be a more effective strategy in Single Nucleotide Polymorphism (SNP) detection and allele frequency estimation because of a higher total coverage. However, compared to individual sequencing, SNP calling from pools has the additional difficulty of distinguishing rare variants from sequencing errors, which is often avoided by establishing a minimum threshold allele frequency for the analysis. Finding an optimal balance between minimizing information loss and reducing sequencing costs is essential to ensure the success of population genomics studies. Here, we have benchmarked the performance of SNP callers for Pool-seq data, based on different approaches, under different conditions, and using computer simulations and real data. We found that SNP callers performance varied for allele frequencies up to 0.35. We also found that SNP callers based on Bayesian (SNAPE-pooled) or maximum likelihood (MAPGD) approaches outperform the two heuristic callers tested (VarScan and PoolSNP), in terms of the balance between sensitivity and FDR both in simulated and sequencing data. Our results will help inform the selection of the most appropriate SNP caller not only for large-scale population studies but also in cases where the Pool-seq strategy is the only option, such as in metagenomic or polyploid studies.

摘要

群体基因组学是一个快速发展的学科，在越来越多的生命科学领域有着广阔的应用前景。测序技术和生物信息学工具的进步使得群体基因组学能够利用全基因组信息来识别感兴趣性状的分子变异体，以及调节这些变异体在空间和时间上的进化力量。然而，对多个群体进行基因组分析的成本仍然太高，无法通过个体基因组测序来解决。通过对个体进行测序，pool-seq 可以成为一种更有效的策略，因为它可以提高 SNP 检测和等位基因频率估计的总覆盖率。然而，与个体测序相比，pool-seq 从池中调用 SNP 还有一个额外的困难，即需要从测序错误中区分罕见的变异体，这通常通过为分析建立一个最小的等位基因频率阈值来避免。在最小化信息损失和降低测序成本之间找到一个最佳平衡点，对于确保群体基因组学研究的成功至关重要。在这里，我们基于不同的方法，在不同的条件下，使用计算机模拟和真实数据，对 pool-seq 数据的 SNP 调用器的性能进行了基准测试。我们发现，在等位基因频率高达 0.35 的情况下，SNP 调用器的性能存在差异。我们还发现，基于贝叶斯（SNAPE-pooled）或最大似然（MAPGD）方法的 SNP 调用器，在模拟和测序数据中，在灵敏度和 FDR 之间的平衡方面，都优于两种启发式调用器（VarScan 和 PoolSNP）。我们的研究结果将有助于指导选择最合适的 SNP 调用器，不仅适用于大规模的群体研究，也适用于 pool-seq 策略是唯一选择的情况，例如在宏基因组学或多倍体研究中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9ad/8251607/bd2a395e6cb0/MEN-21-1216-g001.jpg

相似文献

Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data.使用模拟和真实测序数据对 Pool-seq SNP 调用程序的性能进行基准测试。

Mol Ecol Resour. 2021 May;21(4):1216-1229. doi: 10.1111/1755-0998.13343. Epub 2021 Mar 5.

SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.

Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.聚叶柳穿鱼全基因组重测序的验证

PLoS One. 2015 Oct 13;10(10):e0140462. doi: 10.1371/journal.pone.0140462. eCollection 2015.

Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.利用改良的简化代表性测序和 SNP 调用算法的直接比较，生成猩猩群体基因组学的 SNP 数据集。

BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.

Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping.基于下一代测序数据的群体等位基因频率估计：基于池与个体的基因分型。

Mol Ecol. 2013 Jul;22(14):3766-79. doi: 10.1111/mec.12360. Epub 2013 Jun 4.

Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data.评估低频变异调用工具在检测短读长深度测序数据中的变异方面的性能。

Sci Rep. 2023 Nov 22;13(1):20444. doi: 10.1038/s41598-023-47135-3.

Genotype Calling from Population-Genomic Sequencing Data.基于群体基因组测序数据的基因型分析

G3 (Bethesda). 2017 May 5;7(5):1393-1404. doi: 10.1534/g3.117.039008.

SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.从新一代测序数据中进行 SNP 调用、基因型调用和样本等位基因频率估计。

PLoS One. 2012;7(7):e37558. doi: 10.1371/journal.pone.0037558. Epub 2012 Jul 24.

Benchmarking bulk and single-cell variant-calling approaches on Chromium scRNA-seq and scATAC-seq libraries.在 Chromium scRNA-seq 和 scATAC-seq 文库上对批量和单细胞变异调用方法进行基准测试。

Genome Res. 2024 Sep 20;34(8):1196-1210. doi: 10.1101/gr.277066.122.

引用本文的文献

npstat: An Efficient Tool to Explore the Population Genome Variability and Divergence Using Pool Sequencing Data.npstat：一种利用群体测序数据探索群体基因组变异和分化的高效工具。

Methods Mol Biol. 2025;2935:51-66. doi: 10.1007/978-1-0716-4583-3_3.

Optimizing genomic diversity assessments for conservation of Bromus auleticus (Trinius ex Nees) using individual and pooled sequencing.利用个体测序和混合测序优化用于保护奥氏雀麦（Trinius ex Nees的Trinius）的基因组多样性评估。

PLoS One. 2025 Jun 25;20(6):e0325548. doi: 10.1371/journal.pone.0325548. eCollection 2025.

Two distinct host-specialized fungal species cause white-nose disease in bats.两种不同的宿主特异性真菌物种导致蝙蝠患上白鼻病。

Nature. 2025 May 28. doi: 10.1038/s41586-025-09060-5.

Multiplexed amplicon sequencing reveals the heterogeneous spatial distribution of pyrethroid resistance mutations in Aedes albopictus mosquito populations in southern France.多重扩增子测序揭示了法国南部白纹伊蚊种群中拟除虫菊酯抗性突变的异质空间分布。

Parasit Vectors. 2024 Dec 27;17(1):539. doi: 10.1186/s13071-024-06632-8.

Sampling strategies for genotyping common bean ( L.) Genebank accessions with DArTseq: a comparison of single plants, multiple plants, and DNA pools.利用DArTseq技术对普通菜豆（Phaseolus vulgaris L.）基因库种质进行基因分型的取样策略：单株、多株和DNA池的比较。

Front Plant Sci. 2024 Jul 11;15:1338332. doi: 10.3389/fpls.2024.1338332. eCollection 2024.

Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data.从低覆盖度或混合测序数据中估计微单倍型等位基因频率。

BMC Bioinformatics. 2023 Nov 3;24(1):415. doi: 10.1186/s12859-023-05554-z.

Population Genomics of Pooled Samples: Unveiling Symbiont Infrapopulation Diversity and Host-Symbiont Coevolution.混合样本的群体基因组学：揭示共生生物亚群体多样性与宿主 - 共生生物协同进化

Life (Basel). 2023 Oct 14;13(10):2054. doi: 10.3390/life13102054.

Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies.高通量测序数据组装的病毒基因组中单核苷酸多态性的检测：序列分析策略的大规模性能测试。

PeerJ. 2023 Aug 16;11:e15816. doi: 10.7717/peerj.15816. eCollection 2023.

A macroecological perspective on genetic diversity in the human gut microbiome.从宏生态学角度看人类肠道微生物组的遗传多样性。

PLoS One. 2023 Jul 21;18(7):e0288926. doi: 10.1371/journal.pone.0288926. eCollection 2023.

Genomic differentiation in Pacific cod using Pool-Seq.利用混合测序技术对太平洋鳕鱼进行基因组分化研究。

Evol Appl. 2022 Oct 13;15(11):1907-1924. doi: 10.1111/eva.13488. eCollection 2022 Nov.

本文引用的文献

Slow Recovery from Inbreeding Depression Generated by the Complex Genetic Architecture of Segregating Deleterious Mutations.由分离有害突变的复杂遗传结构引起的近交衰退的缓慢恢复。

Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab330.

Broad geographic sampling reveals the shared basis and environmental correlates of seasonal adaptation in .广泛的地理采样揭示了季节性适应的共享基础和环境相关性。

Elife. 2021 Jun 22;10:e67577. doi: 10.7554/eLife.67577.

Genomic Analysis of European Drosophila melanogaster Populations Reveals Longitudinal Structure, Continent-Wide Selection, and Previously Unknown DNA Viruses.欧洲黑腹果蝇群体的基因组分析揭示了纵向结构、全大陆范围的选择以及先前未知的 DNA 病毒。

Mol Biol Evol. 2020 Sep 1;37(9):2661-2678. doi: 10.1093/molbev/msaa120.

A Whole-Genome Scan for Association with Invasion Success in the Fruit Fly Drosophila suzukii Using Contrasts of Allele Frequencies Corrected for Population Structure.利用校正群体结构的等位基因频率对比对水果蝇 Drosophila suzukii 入侵成功进行全基因组关联扫描。

Mol Biol Evol. 2020 Aug 1;37(8):2369-2385. doi: 10.1093/molbev/msaa098.

Comparative study of population genomic approaches for mapping colony-level traits.群体基因组学方法在群体水平性状定位中的比较研究。

PLoS Comput Biol. 2020 Mar 27;16(3):e1007653. doi: 10.1371/journal.pcbi.1007653. eCollection 2020 Mar.

Strong selective effects of mitochondrial DNA on the nuclear genome.线粒体 DNA 对核基因组具有强烈的选择作用。

Proc Natl Acad Sci U S A. 2020 Mar 24;117(12):6616-6621. doi: 10.1073/pnas.1910141117. Epub 2020 Mar 10.

Detecting Positive Selection in Populations Using Genetic Data.利用遗传数据检测群体中的正选择。

Methods Mol Biol. 2020;2090:87-123. doi: 10.1007/978-1-0716-0199-0_5.

Selection signatures in goats reveal copy number variants underlying breed-defining coat color phenotypes.山羊的选择信号揭示了决定品种特有毛色表型的拷贝数变异。

PLoS Genet. 2019 Dec 16;15(12):e1008536. doi: 10.1371/journal.pgen.1008536. eCollection 2019 Dec.

Extensive impact of low-frequency variants on the phenotypic landscape at population-scale.低频变异对人群规模表型景观的广泛影响。

Elife. 2019 Oct 24;8:e49258. doi: 10.7554/eLife.49258.

Rare variants contribute disproportionately to quantitative trait variation in yeast.稀有变异在酵母的数量性状变异中起不成比例的作用。

Elife. 2019 Oct 24;8:e49212. doi: 10.7554/eLife.49212.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用模拟和真实测序数据对 Pool-seq SNP 调用程序的性能进行基准测试。

Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献