基于汇集或未汇集下一代测序数据的关联研究设计。

Design of association studies with pooled or un-pooled next-generation sequencing data.

机构信息

Department of Integrative Biology, UC Berkeley, Berkeley, California 94720, USA.

出版信息

Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.

DOI:10.1002/gepi.20501

PMID:20552648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5001557/

Abstract

Most common hereditary diseases in humans are complex and multifactorial. Large-scale genome-wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next-generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost-effective protocols for using next-generation sequencing in association mapping studies based on pooled and un-pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon-capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next-generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next-generation sequencing.

摘要

人类最常见的遗传性疾病是复杂的多因素疾病。基于 SNP 基因分型的大规模全基因组关联研究仅鉴定出这些疾病遗传变异的一小部分。一种解释可能是，许多罕见的变异体（次要等位基因频率，MAF <5%），这些变异体未包含在常见的基因分型平台中，可能对这些疾病的遗传变异有很大的贡献。下一代测序技术可以分析罕见的变异体，现在价格便宜到足以成为 SNP 基因分型的可行替代方法。在本文中，我们提出了基于混合和非混合样本的关联图谱研究中使用下一代测序的具有成本效益的协议，并确定了总个体数量、每个池个体数量和测序覆盖度方面的最佳设计。我们进行了一项小型实证研究，以评估在将混合与外显子捕获结合使用的现实环境中混合的方差。为了检测关联，我们开发了一种似然比统计量，该统计量考虑了下一代测序数据的高错误率。我们还进行了广泛的模拟，以确定该方法的功效和准确性。总体而言，我们的研究结果表明，在固定成本的情况下，以更大的池大小和更浅的深度对许多个体进行测序，比以较小的池大小和更深的深度对少数个体进行测序具有更高的功效，即使存在高错误率也是如此。我们的研究结果为正在基于下一代测序开展关联图谱研究的研究人员提供了指导。

相似文献

Design of association studies with pooled or un-pooled next-generation sequencing data.

Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.

Analysis and optimal design for association studies using next-generation sequencing with case-control pools.

Genet Epidemiol. 2012 Dec;36(8):870-81. doi: 10.1002/gepi.21681. Epub 2012 Sep 12.

Resequencing of pooled DNA for detecting disease associations with rare variants.

Genet Epidemiol. 2010 Jul;34(5):492-501. doi: 10.1002/gepi.20502.

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms.

BMC Genomics. 2013;14 Suppl 1(Suppl 1):S1. doi: 10.1186/1471-2164-14-S1-S1. Epub 2013 Jan 21.

The efficacy of detecting variants with small effects on the Affymetrix 6.0 platform using pooled DNA.

Hum Genet. 2011 Nov;130(5):607-21. doi: 10.1007/s00439-011-0974-0. Epub 2011 Mar 22.

Large-scale detection of rare variants via pooled multiplexed next-generation sequencing: towards next-generation Ecotilling.

Plant J. 2011 Aug;67(4):736-45. doi: 10.1111/j.1365-313X.2011.04627.x. Epub 2011 Jul 11.

On optimal pooling designs to identify rare variants through massive resequencing.

Genet Epidemiol. 2011 Apr;35(3):139-47. doi: 10.1002/gepi.20561. Epub 2011 Jan 19.

Incorporation of genetic model parameters for cost-effective designs of genetic association studies using DNA pooling.

BMC Genomics. 2007 Jul 16;8:238. doi: 10.1186/1471-2164-8-238.

On the use of DNA pooling to estimate haplotype frequencies.

Genet Epidemiol. 2003 Jan;24(1):74-82. doi: 10.1002/gepi.10195.

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.

Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.

引用本文的文献

Evaluation of nine statistics to identify QTLs in bulk segregant analysis using next generation sequencing approaches.

BMC Genomics. 2022 Jul 6;23(1):490. doi: 10.1186/s12864-022-08718-y.

Establishment and Characterization of a Cell Line (S-RMS1) Derived from an Infantile Spindle Cell Rhabdomyosarcoma with Fusion Transcript.

Int J Mol Sci. 2021 May 22;22(11):5484. doi: 10.3390/ijms22115484.

Developmental plasticity shapes social traits and selection in a facultatively eusocial bee.

Proc Natl Acad Sci U S A. 2020 Jun 16;117(24):13615-13625. doi: 10.1073/pnas.2000344117. Epub 2020 May 29.

The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments.

J Anim Sci Biotechnol. 2019 Jun 21;10:44. doi: 10.1186/s40104-019-0359-0. eCollection 2019.

Advances in the genome-wide association study of chronic hepatitis B susceptibility in Asian population.

Eur J Med Res. 2017 Dec 28;22(1):55. doi: 10.1186/s40001-017-0288-3.

Genome-wide analysis of ivermectin response by Onchocerca volvulus reveals that genetic drift and soft selective sweeps contribute to loss of drug sensitivity.

PLoS Negl Trop Dis. 2017 Jul 26;11(7):e0005816. doi: 10.1371/journal.pntd.0005816. eCollection 2017 Jul.

Identification of Sex-determining Loci in Pacific White Shrimp Litopeneaus vannamei Using Linkage and Association Analysis.

Mar Biotechnol (NY). 2017 Jun;19(3):277-286. doi: 10.1007/s10126-017-9749-5. Epub 2017 May 16.

Bulked sample analysis in genetics, genomics and crop improvement.

Plant Biotechnol J. 2016 Oct;14(10):1941-55. doi: 10.1111/pbi.12559. Epub 2016 Apr 28.

Wham: Identifying Structural Variants of Biological Consequence.

PLoS Comput Biol. 2015 Dec 1;11(12):e1004572. doi: 10.1371/journal.pcbi.1004572. eCollection 2015 Dec.

A Scale-Corrected Comparison of Linkage Disequilibrium Levels between Genic and Non-Genic Regions.

PLoS One. 2015 Oct 30;10(10):e0141216. doi: 10.1371/journal.pone.0141216. eCollection 2015.

本文引用的文献

SNP detection for massively parallel whole-genome resequencing.

Genome Res. 2009 Jun;19(6):1124-32. doi: 10.1101/gr.088013.108. Epub 2009 May 6.

A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1).

Nat Genet. 2009 May;41(5):579-84. doi: 10.1038/ng.353. Epub 2009 Mar 29.

Lessons learnt from large-scale exon re-sequencing of the X chromosome.

Hum Mol Genet. 2009 Apr 15;18(R1):R60-4. doi: 10.1093/hmg/ddp071.

Human genetic variation and its contribution to complex traits.

Nat Rev Genet. 2009 Apr;10(4):241-51. doi: 10.1038/nrg2554.

Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes.

Science. 2009 Apr 17;324(5925):387-9. doi: 10.1126/science.1167728. Epub 2009 Mar 5.

Exomic sequencing identifies PALB2 as a pancreatic cancer susceptibility gene.

Science. 2009 Apr 10;324(5924):217. doi: 10.1126/science.1171202. Epub 2009 Mar 5.

Quantification of rare allelic variants from pooled genomic DNA.

Nat Methods. 2009 Apr;6(4):263-5. doi: 10.1038/nmeth.1307. Epub 2009 Mar 1.

Power of deep, all-exon resequencing for discovery of human trait genes.

Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3871-6. doi: 10.1073/pnas.0812824106. Epub 2009 Feb 6.

A two-stage genome-wide association study of sporadic amyotrophic lateral sclerosis.

Hum Mol Genet. 2009 Apr 15;18(8):1524-32. doi: 10.1093/hmg/ddp059. Epub 2009 Feb 4.

Identification of susceptibility genes for complex diseases using pooling-based genome-wide association scans.

Hum Genet. 2009 Apr;125(3):305-18. doi: 10.1007/s00439-009-0626-9. Epub 2009 Jan 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于汇集或未汇集下一代测序数据的关联研究设计。

Design of association studies with pooled or un-pooled next-generation sequencing data.

机构信息

Department of Integrative Biology, UC Berkeley, Berkeley, California 94720, USA.

出版信息

Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.

DOI:10.1002/gepi.20501

PMID:20552648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5001557/

Abstract

摘要

基于汇集或未汇集下一代测序数据的关联研究设计。

Design of association studies with pooled or un-pooled next-generation sequencing data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于汇集或未汇集下一代测序数据的关联研究设计。

Design of association studies with pooled or un-pooled next-generation sequencing data.

机构信息

出版信息