基于超二项分布变异的合并 DNA 测序数据分析方法

Extra-binomial variation approach for analysis of pooled DNA sequencing data.

机构信息

Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge CB2 0XY, UK.

出版信息

Bioinformatics. 2012 Nov 15;28(22):2898-904. doi: 10.1093/bioinformatics/bts553. Epub 2012 Sep 12.

DOI:10.1093/bioinformatics/bts553

PMID:22976083

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3496343/

Abstract

MOTIVATION

The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate.

RESULTS

We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods.

AVAILABILITY

Package 'extraBinomial' is on http://cran.r-project.org/.

CONTACT

chris.wallace@cimr.cam.ac.uk.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics Online.

摘要

动机

下一代测序技术的发明使得研究更有可能确定致病基因的罕见变异成为可能。为了使这些实验在经济上可行，通常在测序前将来自几个主体的 DNA 样本混合。这会引起较大的组间变异，再加上其他来源的实验误差，会导致过度分散的数据。对混合测序数据进行统计分析需要适当建模这种额外的方差，以避免虚报阳性率。

结果

我们提出了一种基于超二项式模型的新统计方法来解决过度分散问题，并将其应用于混合病例对照数据。我们证明，与标准二项式模型或 Williams 提出的传统超二项式模型相比，我们的模型对数据的拟合更好，并且与其他方法相比，可以分析罕见和常见变异，并且组深度较低或更可变。

可用性

'extraBinomial' 包可在 http://cran.r-project.org/ 上获得。

联系方式

chris.wallace@cimr.cam.ac.uk。

补充信息

补充数据可在 Bioinformatics Online 上获得。

相似文献

Extra-binomial variation approach for analysis of pooled DNA sequencing data.基于超二项分布变异的合并 DNA 测序数据分析方法

Bioinformatics. 2012 Nov 15;28(22):2898-904. doi: 10.1093/bioinformatics/bts553. Epub 2012 Sep 12.

A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

Resequencing of pooled DNA for detecting disease associations with rare variants.对 pooled DNA 进行重测序以检测与罕见变异相关的疾病关联。

Genet Epidemiol. 2010 Jul;34(5):492-501. doi: 10.1002/gepi.20502.

Beta-Binomial Model for the Detection of Rare Mutations in Pooled Next-Generation Sequencing Experiments.用于在混合下一代测序实验中检测罕见突变的贝塔-二项式模型。

J Comput Biol. 2017 Apr;24(4):357-367. doi: 10.1089/cmb.2016.0106. Epub 2016 Sep 15.

Design of association studies with pooled or un-pooled next-generation sequencing data.基于汇集或未汇集下一代测序数据的关联研究设计。

Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.

An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data.基于多样本下一代测序数据的基因分型和 SNP 检测的经验贝叶斯方法。

Bioinformatics. 2016 Nov 1;32(21):3240-3245. doi: 10.1093/bioinformatics/btw409. Epub 2016 Jul 4.

Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples.混合样本下一代测序的等位基因频率估计和疾病关联检验中的偏倚和误差。

Genet Epidemiol. 2012 Sep;36(6):549-60. doi: 10.1002/gepi.21648. Epub 2012 Jun 6.

An empirical Bayes mixture model for SNP detection in pooled sequencing data.基于经验贝叶斯混合模型的混合测序数据 SNP 检测方法

Bioinformatics. 2012 Oct 15;28(20):2569-75. doi: 10.1093/bioinformatics/bts501. Epub 2012 Aug 22.

Detection of rare genomic variants from pooled sequencing using SPLINTER.使用SPLINTER从混合测序中检测罕见基因组变异。

J Vis Exp. 2012 Jun 23(64):3943. doi: 10.3791/3943.

Multiple testing in genome-wide association studies via hidden Markov models.基于隐马尔可夫模型的全基因组关联研究中的多重检验。

Bioinformatics. 2009 Nov 1;25(21):2802-8. doi: 10.1093/bioinformatics/btp476. Epub 2009 Aug 4.

引用本文的文献

GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation.GPA：基于贝叶斯估计的宏基因组分析中微生物遗传多态性分配工具。

Genomics Proteomics Bioinformatics. 2019 Feb;17(1):106-117. doi: 10.1016/j.gpb.2018.12.005. Epub 2019 Apr 23.

Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution.通过实验进化中时间序列等位基因频率变化估算有效种群大小

Genetics. 2016 Oct;204(2):723-735. doi: 10.1534/genetics.116.191197. Epub 2016 Aug 19.

Pooled sequencing of 531 genes in inflammatory bowel disease identifies an associated rare variant in BTNL2 and implicates other immune related genes.对炎症性肠病中531个基因进行的汇总测序确定了BTNL2基因中的一个相关罕见变异，并涉及其他免疫相关基因。

PLoS Genet. 2015 Feb 11;11(2):e1004955. doi: 10.1371/journal.pgen.1004955. eCollection 2015 Feb.

本文引用的文献

Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease.全基因组关联研究位点的深度重测序鉴定出与炎症性肠病相关的独立稀有变异。

Nat Genet. 2011 Oct 9;43(11):1066-73. doi: 10.1038/ng.952.

A powerful and flexible approach to the analysis of RNA sequence count data.一种强大而灵活的 RNA 序列计数数据分析方法。

Bioinformatics. 2011 Oct 1;27(19):2672-8. doi: 10.1093/bioinformatics/btr449. Epub 2011 Aug 2.

Resequencing of pooled DNA for detecting disease associations with rare variants.对 pooled DNA 进行重测序以检测与罕见变异相关的疾病关联。

Genet Epidemiol. 2010 Jul;34(5):492-501. doi: 10.1002/gepi.20502.

Design of association studies with pooled or un-pooled next-generation sequencing data.基于汇集或未汇集下一代测序数据的关联研究设计。

Genet Epidemiol. 2010 Jul;34(5):479-91. doi: 10.1002/gepi.20501.

A statistical method for the detection of variants from next-generation resequencing of DNA pools.一种用于从 DNA 池的下一代重测序中检测变异的统计方法。

Bioinformatics. 2010 Jun 15;26(12):i318-24. doi: 10.1093/bioinformatics/btq214.

The Beta-Binomial Distribution for Estimating the Number of False Rejections in Microarray Gene Expression Studies.用于估计微阵列基因表达研究中错误拒绝数量的贝塔-二项分布。

Comput Stat Data Anal. 2009 Mar 15;53(5):1688-1700. doi: 10.1016/j.csda.2008.01.013.

Fast and accurate long-read alignment with Burrows-Wheeler transform.基于 Burrows-Wheeler 变换的快速准确长读比对。

Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15.

VarScan: variant detection in massively parallel sequencing of individual and pooled samples.VarScan：个体样本与混合样本大规模平行测序中的变异检测

Bioinformatics. 2009 Sep 1;25(17):2283-5. doi: 10.1093/bioinformatics/btp373. Epub 2009 Jun 19.

The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。

Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes.全基因组关联研究和荟萃分析发现，40 多个位点影响 1 型糖尿病的风险。

Nat Genet. 2009 Jun;41(6):703-7. doi: 10.1038/ng.381. Epub 2009 May 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于超二项分布变异的合并 DNA 测序数据分析方法

Extra-binomial variation approach for analysis of pooled DNA sequencing data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献