• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于测序的混合样本 SNP 检测。

SNP calling by sequencing pooled samples.

机构信息

Centro Nacional de Análisis Genómico, Parc Científic de Barcelona, Barcelona, 08028, Spain.

出版信息

BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.

DOI:10.1186/1471-2105-13-239
PMID:22992255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3475117/
Abstract

BACKGROUND

Performing high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read - or, more likely, none - from a true singleton.

RESULTS

To improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages.

CONCLUSIONS

We present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35%and FDR ≈ 2.5%. snape is available at http://code.google.com/p/snape-pooled/ (source code and precompiled binaries).

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/f13566583948/1471-2105-13-239-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/db36ec549b26/1471-2105-13-239-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/d3fcae97727b/1471-2105-13-239-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/f13566583948/1471-2105-13-239-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/db36ec549b26/1471-2105-13-239-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/d3fcae97727b/1471-2105-13-239-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f002/3475117/f13566583948/1471-2105-13-239-3.jpg
摘要

背景

对来自不同个体的样本进行高通量测序是一种策略,可以以个体测序所需成本的一小部分来描述遗传变异性。在某些情况下,某些变异性估计量的方差甚至比个体测序获得的方差还要小。然而,从混合样本中进行 SNP 调用并估计次要等位基因的频率是一项微妙的工作,原因至少有三个。首先,测序错误可能比个体 SNP 调用中的错误更相关:虽然在个体测序中可以通过限制每个等位基因的最小读取次数来减少其影响,但在混合池中,这将产生强烈且不理想的影响,因为在池中低频等位基因不太可能被多次读取。其次,个体中杂合位点的先验等位基因频率通常为 0.5(假设不分析来自例如癌症组织的序列),但在混合池中并非如此:实际上,根据标准中性模型,单倍体(即频率最低的等位基因)是最常见的变异类,因为 P(f)∝1/f,并且随着样本量的增加,它们出现的频率更高。第三,在池中的读取中仅出现一次的等位基因不一定对应于构成池的个体集合中的单倍体,反之亦然,来自真正的单倍体的可能不止一个读取-或者更可能的是,没有一个读取。

结果

为了改进现有的理论和软件包,我们开发了一种用于混合池中小等位基因频率(MAF)计算和 SNP 调用的贝叶斯方法(并在名为 snape 的程序中实现了它):该方法考虑了测序错误,并允许用户选择不同的先验。我们还建立了一个管道,可以模拟导致 SNP 的合并过程,混合过程和测序过程。我们使用它来比较 snape 与其他软件包的性能。

结论

我们提出了一种用于混合样本中 SNP 调用的软件:它具有良好的功效,同时保持低假发现率(FDR)。该方法还提供了 SNP 分离的后验概率和每个 SNP 的 f 的完整后验分布。为了测试我们软件的行为,我们通过模拟合并生成了(人工)基因组,并计算了混合测序方案随后的 SNP 调用的影响。在这种设置下,snape 的功效和错误发现率(FDR)都优于可比的软件包 samtools、PoPoolation、Varscan:对于 N=50 条染色体,snape 的功效约为 35%,FDR 约为 2.5%。snape 可在 http://code.google.com/p/snape-pooled/ 获得(源代码和预编译二进制文件)。

相似文献

1
SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。
BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.
2
A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.
3
Benchmarking the performance of Pool-seq SNP callers using simulated and real sequencing data.使用模拟和真实测序数据对 Pool-seq SNP 调用程序的性能进行基准测试。
Mol Ecol Resour. 2021 May;21(4):1216-1229. doi: 10.1111/1755-0998.13343. Epub 2021 Mar 5.
4
Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.聚叶柳穿鱼全基因组重测序的验证
PLoS One. 2015 Oct 13;10(10):e0140462. doi: 10.1371/journal.pone.0140462. eCollection 2015.
5
Read trimming has minimal effect on bacterial SNP-calling accuracy.reads 修剪对细菌 SNP 调用准确性的影响最小。
Microb Genom. 2020 Dec;6(12). doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11.
6
SiNPle: Fast and Sensitive Variant Calling for Deep Sequencing Data.SiNPle:用于深度测序数据的快速灵敏变异calling。
Genes (Basel). 2019 Jul 25;10(8):561. doi: 10.3390/genes10080561.
7
tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.tarSVM:使用支持向量机提高基于微流控PCR的靶向新一代测序得出的变异检测准确性。
BMC Bioinformatics. 2016 Jun 10;17(1):233. doi: 10.1186/s12859-016-1108-4.
8
Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data.Heap:一种用于低覆盖度高通量测序数据的高灵敏度和高精度单核苷酸多态性检测工具。
DNA Res. 2017 Aug 1;24(4):397-405. doi: 10.1093/dnares/dsx012.
9
Improvement in detection of minor alleles in next generation sequencing by base quality recalibration.通过碱基质量重新校准提高下一代测序中稀有等位基因的检测能力。
BMC Genomics. 2016 Feb 27;17:139. doi: 10.1186/s12864-016-2463-2.
10
Population-based rare variant detection via pooled exome or custom hybridization capture with or without individual indexing.基于人群的罕见变异检测,通过组合外显子组或定制杂交捕获,有或没有个体索引。
BMC Genomics. 2012 Dec 6;13:683. doi: 10.1186/1471-2164-13-683.

引用本文的文献

1
Adaptation to Freshwater in Allis Shad Involved a Combination of Genomic and Epigenomic Changes.西鲱对淡水的适应涉及基因组和表观基因组变化的组合。
J Mol Evol. 2025 Jun 2. doi: 10.1007/s00239-025-10253-9.
2
Two distinct host-specialized fungal species cause white-nose disease in bats.两种不同的宿主特异性真菌物种导致蝙蝠患上白鼻病。
Nature. 2025 May 28. doi: 10.1038/s41586-025-09060-5.
3
Detection of rare variants among nuclei populating the arbuscular mycorrhizal fungal model species Rhizophagus irregularis DAOM197198.检测定殖在丛枝菌根真菌模式种粗糙球囊霉 DAOM197198 中的核稀有变体。

本文引用的文献

1
Neutrality tests for sequences with missing data.带有缺失数据的序列的中立性检验。
Genetics. 2012 Aug;191(4):1397-401. doi: 10.1534/genetics.112.139949. Epub 2012 Jun 1.
2
ART: a next-generation sequencing read simulator.ART:一种新一代测序读模拟程序。
Bioinformatics. 2012 Feb 15;28(4):593-4. doi: 10.1093/bioinformatics/btr708. Epub 2011 Dec 23.
3
Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA.通过大规模平行测序 pooled DNA 揭示猪驯化和选择的全基因组足迹。
G3 (Bethesda). 2024 Jun 5;14(6). doi: 10.1093/g3journal/jkae074.
4
Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data.从低覆盖度或混合测序数据中估计微单倍型等位基因频率。
BMC Bioinformatics. 2023 Nov 3;24(1):415. doi: 10.1186/s12859-023-05554-z.
5
Population Genomics of Pooled Samples: Unveiling Symbiont Infrapopulation Diversity and Host-Symbiont Coevolution.混合样本的群体基因组学:揭示共生生物亚群体多样性与宿主 - 共生生物协同进化
Life (Basel). 2023 Oct 14;13(10):2054. doi: 10.3390/life13102054.
6
ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language.ngsJulia:使用 Julia 语言进行下一代 DNA 测序数据的群体遗传分析。
F1000Res. 2023 Jul 14;11:126. doi: 10.12688/f1000research.104368.2. eCollection 2022.
7
Artificial selection reveals complex genetic architecture of shoot branching and its response to nitrate supply in Arabidopsis.人工选择揭示了拟南芥分枝性状的复杂遗传结构及其对硝酸盐供应的响应。
PLoS Genet. 2023 Aug 24;19(8):e1010863. doi: 10.1371/journal.pgen.1010863. eCollection 2023 Aug.
8
Identification and Functional Analysis of Transcriptome Profiles, Long Non-Coding RNAs, Single-Nucleotide Polymorphisms, and Alternative Splicing from the Oocyte to the Preimplantation Stage of Sheep by Single-Cell RNA Sequencing.通过单细胞 RNA 测序鉴定和功能分析绵羊从卵母细胞到植入前阶段的转录组图谱、长非编码 RNA、单核苷酸多态性和可变剪接。
Genes (Basel). 2023 May 25;14(6):1145. doi: 10.3390/genes14061145.
9
Phenotypic variation and quantitative trait loci for resistance to southern anthracnose and clover rot in red clover.红三叶草对南方炭疽病和三叶草腐烂病的表型变异和数量性状位点。
Theor Appl Genet. 2022 Dec;135(12):4337-4349. doi: 10.1007/s00122-022-04223-8. Epub 2022 Sep 25.
10
Identification of loci controlling timing of stem elongation in red clover using genotyping by sequencing of pooled phenotypic extremes.利用表型极端池的测序进行基因分型鉴定红三叶草茎伸长时间的控制基因座。
Mol Genet Genomics. 2022 Nov;297(6):1587-1600. doi: 10.1007/s00438-022-01942-x. Epub 2022 Aug 24.
PLoS One. 2011 Apr 4;6(4):e14782. doi: 10.1371/journal.pone.0014782.
4
PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals.PoPoolation:用于分析来自个体混合群体的下一代测序数据的群体遗传分析工具包。
PLoS One. 2011 Jan 6;6(1):e15925. doi: 10.1371/journal.pone.0015925.
5
Massive parallel sequencing in animal genetics: wherefroms and wheretos.动物遗传学中的大规模并行测序:来龙去脉。
Anim Genet. 2010 Dec;41(6):561-9. doi: 10.1111/j.1365-2052.2010.02057.x.
6
The next generation of molecular markers from massively parallel sequencing of pooled DNA samples.基于 DNA 样本池的高通量测序的下一代分子标记物。
Genetics. 2010 Sep;186(1):207-18. doi: 10.1534/genetics.110.114397. Epub 2010 May 10.
7
Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。
Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.
8
VarScan: variant detection in massively parallel sequencing of individual and pooled samples.VarScan:个体样本与混合样本大规模平行测序中的变异检测
Bioinformatics. 2009 Sep 1;25(17):2283-5. doi: 10.1093/bioinformatics/btp373. Epub 2009 Jun 19.
9
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.
10
The Distribution of Gene Frequencies Under Irreversible Mutation.不可逆突变下基因频率的分布
Proc Natl Acad Sci U S A. 1938 Jul;24(7):253-9. doi: 10.1073/pnas.24.7.253.