Mar Jessica C, Kimura Yasumasa, Schroder Kate, Irvine Katharine M, Hayashizaki Yoshihide, Suzuki Harukazu, Hume David, Quackenbush John
Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.
BMC Bioinformatics. 2009 Apr 19;10:110. doi: 10.1186/1471-2105-10-110.
High-throughput real-time quantitative reverse transcriptase polymerase chain reaction (qPCR) is a widely used technique in experiments where expression patterns of genes are to be profiled. Current stage technology allows the acquisition of profiles for a moderate number of genes (50 to a few thousand), and this number continues to grow. The use of appropriate normalization algorithms for qPCR-based data is therefore a highly important aspect of the data preprocessing pipeline.
We present and evaluate two data-driven normalization methods that directly correct for technical variation and represent robust alternatives to standard housekeeping gene-based approaches. We evaluated the performance of these methods against a single gene housekeeping gene method and our results suggest that quantile normalization performs best. These methods are implemented in freely-available software as an R package qpcrNorm distributed through the Bioconductor project.
The utility of the approaches that we describe can be demonstrated most clearly in situations where standard housekeeping genes are regulated by some experimental condition. For large qPCR-based data sets, our approaches represent robust, data-driven strategies for normalization.
高通量实时定量逆转录聚合酶链反应(qPCR)是一种广泛应用于基因表达模式分析实验的技术。现阶段技术能够获取中等数量基因(50至数千个)的表达谱,且这一数量仍在不断增加。因此,对基于qPCR的数据使用合适的标准化算法是数据预处理流程中极为重要的一个方面。
我们提出并评估了两种数据驱动的标准化方法,它们可直接校正技术变异,是基于标准看家基因方法的可靠替代方案。我们将这些方法的性能与单基因看家基因方法进行了比较,结果表明分位数标准化表现最佳。这些方法以R包qpcrNorm的形式在通过生物导体项目分发的免费软件中实现。
我们所描述的方法的实用性在标准看家基因受某些实验条件调控的情况下最为明显。对于基于qPCR的大型数据集,我们的方法代表了强大的数据驱动标准化策略。