Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Addenbrooke's Hospital, Cambridge CB2 0XY, UK.
Bioinformatics. 2012 Nov 15;28(22):2898-904. doi: 10.1093/bioinformatics/bts553. Epub 2012 Sep 12.
The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate.
We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods.
Package 'extraBinomial' is on http://cran.r-project.org/.
Supplementary data are available at Bioinformatics Online.
下一代测序技术的发明使得研究更有可能确定致病基因的罕见变异成为可能。为了使这些实验在经济上可行,通常在测序前将来自几个主体的 DNA 样本混合。这会引起较大的组间变异,再加上其他来源的实验误差,会导致过度分散的数据。对混合测序数据进行统计分析需要适当建模这种额外的方差,以避免虚报阳性率。
我们提出了一种基于超二项式模型的新统计方法来解决过度分散问题,并将其应用于混合病例对照数据。我们证明,与标准二项式模型或 Williams 提出的传统超二项式模型相比,我们的模型对数据的拟合更好,并且与其他方法相比,可以分析罕见和常见变异,并且组深度较低或更可变。
'extraBinomial' 包可在 http://cran.r-project.org/ 上获得。
补充数据可在 Bioinformatics Online 上获得。