• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

混合样本下一代测序的等位基因频率估计和疾病关联检验中的偏倚和误差。

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.

机构信息

Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA.

出版信息

Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.

DOI:10.1002/gepi.21648
PMID:22674656
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3477622/
Abstract

Next-generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, for example, equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.

摘要

下一代测序技术因其能够在没有先验单核苷酸多态性 (SNP) 信息的情况下识别常见和罕见变体,因此被广泛用于研究复杂疾病。对有意义的目标区域进行合并测序可以降低成本,并允许分析更多的样本,从而提高与疾病相关的变异检测的统计能力。已经针对合并数据的疾病关联测试和最佳合并设计开发了几种方法,这些方法是在合并过程的某些假设下进行的,例如,对合并的均等/不均等贡献、测序深度变化和错误率。然而,这些简化的假设可能无法描绘出影响合并测序数据质量的许多因素,例如目标捕获和测序过程中的 PCR 扩增、参考等位基因偏倚等。因此,观察到的数据的特性可能与简化假设下预期的数据特性有很大的不同。在这里,我们使用来自合并样本靶向测序的真实数据集,以及相同受试者的微阵列 SNP 基因型,来识别和量化影响观察到的测序数据的因素(偏差和错误)。通过模拟,我们发现这些因素对等位基因频率估计的准确性和关联测试的功效有重大影响。此外,我们开发了一种工作流程协议,将这些因素纳入数据分析中,以减少合并测序数据中的潜在偏差和错误,并更好地估计等位基因频率。该工作流程 Psafe 可在 http://bioinformatics.med.yale.edu/group/ 获得。

相似文献

1
Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.混合样本下一代测序的等位基因频率估计和疾病关联检验中的偏倚和误差。
Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.
2
A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.基于 EM 算法的基于测序数据的等位基因频率估计、SNP 检测和关联研究的统一方法。
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.
3
Design of association studies with pooled or un-pooled next-generation sequencing data.基于汇集或未汇集下一代测序数据的关联研究设计。
Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.
4
SNP calling by sequencing pooled samples.基于测序的混合样本 SNP 检测。
BMC Bioinformatics. 2012 Sep 20;13:239. doi: 10.1186/1471-2105-13-239.
5
Estimating allele frequency from next-generation sequencing of pooled mitochondrial DNA samples.从混合线粒体DNA样本的下一代测序中估计等位基因频率。
Front Genet. 2011 Aug 17;2:51. doi: 10.3389/fgene.2011.00051. eCollection 2011.
6
Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping.基于下一代测序数据的群体等位基因频率估计:基于池与个体的基因分型。
Mol Ecol. 2013 Jul;22(14):3766-79. doi: 10.1111/mec.12360. Epub 2013 Jun 4.
7
Evaluation of allele frequency estimation using pooled sequencing data simulation.使用混合测序数据模拟评估等位基因频率估计
ScientificWorldJournal. 2013;2013:895496. doi: 10.1155/2013/895496. Epub 2013 Feb 7.
8
A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。
Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.
9
Analysis and optimal design for association studies using next-generation sequencing with case-control pools.使用病例对照样本池的新一代测序进行关联研究的分析与优化设计
Genet Epidemiol. 2012 Dec;36(8):870-81. doi: 10.1002/gepi.21681. Epub 2012 Sep 12.
10
An evaluation of allele frequency estimation accuracy using pooled sequencing data.使用混合测序数据评估等位基因频率估计准确性。
Int J Comput Biol Drug Des. 2013;6(4):279-93. doi: 10.1504/IJCBDD.2013.056709. Epub 2013 Sep 30.

引用本文的文献

1
Leveraging ancient DNA to uncover signals of natural selection in Europe lost due to admixture or drift.利用古代 DNA 揭示因混合或漂变而在欧洲失去的自然选择信号。
Nat Commun. 2024 Nov 12;15(1):9772. doi: 10.1038/s41467-024-53852-8.
2
Sampling strategies for genotyping common bean ( L.) Genebank accessions with DArTseq: a comparison of single plants, multiple plants, and DNA pools.利用DArTseq技术对普通菜豆(Phaseolus vulgaris L.)基因库种质进行基因分型的取样策略:单株、多株和DNA池的比较。
Front Plant Sci. 2024 Jul 11;15:1338332. doi: 10.3389/fpls.2024.1338332. eCollection 2024.
3
How array design creates SNP ascertainment bias.基因芯片设计如何导致 SNP 确认偏倚。
PLoS One. 2021 Mar 30;16(3):e0245178. doi: 10.1371/journal.pone.0245178. eCollection 2021.
4
The presence and impact of reference bias on population genomic studies of prehistoric human populations.史前人类群体的种群基因组研究中参考偏倚的存在和影响。
PLoS Genet. 2019 Jul 26;15(7):e1008302. doi: 10.1371/journal.pgen.1008302. eCollection 2019 Jul.
5
Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing.低覆盖度测序中指数跳跃和偏向参考等位基因对基因型调用准确性的影响。
Genet Sel Evol. 2018 Dec 13;50(1):64. doi: 10.1186/s12711-018-0436-4.
6
Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus.对 pooled bacterial samples 进行序列分析可鉴定 A 组链球菌菌株的变异。
Sci Rep. 2017 Mar 31;7:45771. doi: 10.1038/srep45771.
7
Targeted Sequencing of Lung Function Loci in Chronic Obstructive Pulmonary Disease Cases and Controls.慢性阻塞性肺疾病病例与对照中肺功能基因座的靶向测序
PLoS One. 2017 Jan 23;12(1):e0170222. doi: 10.1371/journal.pone.0170222. eCollection 2017.
8
Design of DNA pooling to allow incorporation of covariates in rare variants analysis.用于在罕见变异分析中纳入协变量的DNA池设计。
PLoS One. 2014 Dec 8;9(12):e114523. doi: 10.1371/journal.pone.0114523. eCollection 2014.
9
Sequencing pools of individuals - mining genome-wide polymorphism data without big funding.对个体进行测序 - 在没有大量资金的情况下挖掘全基因组多态性数据。
Nat Rev Genet. 2014 Nov;15(11):749-63. doi: 10.1038/nrg3803. Epub 2014 Sep 23.
10
MotorPlex provides accurate variant detection across large muscle genes both in single myopathic patients and in pools of DNA samples.MotorPlex 可在单个肌病患者和 DNA 样本池中对大型肌肉基因进行准确的变异检测。
Acta Neuropathol Commun. 2014 Sep 11;2:100. doi: 10.1186/s40478-014-0100-3.

本文引用的文献

1
Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease.全基因组关联研究位点的深度重测序鉴定出与炎症性肠病相关的独立稀有变异。
Nat Genet. 2011 Oct 9;43(11):1066-73. doi: 10.1038/ng.952.
2
Next generation sequencing of pooled samples reveals new SNRNP200 mutations associated with retinitis pigmentosa.高通量测序池样本揭示了与视网膜色素变性相关的新 SNRNP200 突变。
Hum Mutat. 2011 Jun;32(6):E2246-58. doi: 10.1002/humu.21485. Epub 2011 Feb 24.
3
Genotype and SNP calling from next-generation sequencing data.从下一代测序数据中进行基因型和单核苷酸多态性(SNP)的调用。
Nat Rev Genet. 2011 Jun;12(6):443-51. doi: 10.1038/nrg2986.
4
A framework for variation discovery and genotyping using next-generation DNA sequencing data.利用下一代 DNA 测序数据进行变异发现和基因分型的框架。
Nat Genet. 2011 May;43(5):491-8. doi: 10.1038/ng.806. Epub 2011 Apr 10.
5
On optimal pooling designs to identify rare variants through massive resequencing.通过大规模重测序鉴定罕见变异的最优合并设计。
Genet Epidemiol. 2011 Apr;35(3):139-47. doi: 10.1002/gepi.20561. Epub 2011 Jan 19.
6
Resequencing of pooled DNA for detecting disease associations with rare variants.对 pooled DNA 进行重测序以检测与罕见变异相关的疾病关联。
Genet Epidemiol. 2010 Jul;34(5):492-501. doi: 10.1002/gepi.20502.
7
Design of association studies with pooled or un-pooled next-generation sequencing data.基于汇集或未汇集下一代测序数据的关联研究设计。
Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.
8
The mutation spectrum revealed by paired genome sequences from a lung cancer patient.配对肺癌患者基因组序列揭示的突变谱。
Nature. 2010 May 27;465(7297):473-7. doi: 10.1038/nature09004.
9
Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples.高多重条码测序:一种用于并行分析混合样本的高效方法。
Nucleic Acids Res. 2010 Jul;38(13):e142. doi: 10.1093/nar/gkq368. Epub 2010 May 11.
10
Complex landscapes of somatic rearrangement in human breast cancer genomes.人类乳腺癌基因组中体细胞重排的复杂景观。
Nature. 2009 Dec 24;462(7276):1005-10. doi: 10.1038/nature08645.