McCall Matthew N, McMurray Helene R, Land Hartmut, Almudevar Anthony
Department of Biostatistics and Computational Biology, Department of Biomedical Genetics and James P Wilmot Cancer Center, University of Rochester Medical Center, Rochester, NY 14642, USA.
Department of Biostatistics and Computational Biology, Department of Biomedical Genetics and James P Wilmot Cancer Center, University of Rochester Medical Center, Rochester, NY 14642, USADepartment of Biostatistics and Computational Biology, Department of Biomedical Genetics and James P Wilmot Cancer Center, University of Rochester Medical Center, Rochester, NY 14642, USA.
Bioinformatics. 2014 Aug 15;30(16):2310-6. doi: 10.1093/bioinformatics/btu239. Epub 2014 Apr 23.
Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization and statistical analysis, little attention has been given to qPCR non-detects-those reactions failing to produce a minimum amount of signal.
We show that the common methods of handling qPCR non-detects lead to biased inference. Furthermore, we show that non-detects do not represent data missing completely at random and likely represent missing data occurring not at random. We propose a model of the missing data mechanism and develop a method to directly model non-detects as missing data. Finally, we show that our approach results in a sizeable reduction in bias when estimating both absolute and differential gene expression.
The proposed algorithm is implemented in the R package, nondetects. This package also contains the raw data for the three example datasets used in this manuscript. The package is freely available at http://mnmccall.com/software and as part of the Bioconductor project.
定量实时聚合酶链反应(qPCR)是测量基因表达最广泛使用的方法之一。尽管在qPCR实验室方案、标准化和统计分析方面进行了广泛研究,但对于qPCR未检测到的情况(即那些未能产生最小信号量的反应)关注甚少。
我们表明,处理qPCR未检测到情况的常用方法会导致有偏差的推断。此外,我们表明未检测到的情况并非完全随机缺失的数据,可能代表非随机出现的缺失数据。我们提出了一种缺失数据机制模型,并开发了一种将未检测到的情况直接建模为缺失数据的方法。最后,我们表明,在估计绝对和差异基因表达时,我们的方法可大幅减少偏差。
所提出的算法在R包nondetects中实现。该包还包含本手稿中使用的三个示例数据集的原始数据。该包可在http://mnmccall.com/software免费获取,也是生物导体项目的一部分。