Center for Quantitative Biology, Peking University, Beijing, 100871, China.
BMC Genomics. 2013 Jan 16;14:31. doi: 10.1186/1471-2164-14-31.
Microarray technology is widely utilized for monitoring the expression changes of thousands of genes simultaneously. However, the requirement of relatively large amount of RNA for labeling and hybridization makes it difficult to perform microarray experiments with limited biological materials, thus leads to the development of many methods for preparing and amplifying mRNA. It is addressed that amplification methods usually bring bias, which may strongly hamper the following interpretation of the results. A big challenge is how to correct for the bias before further analysis.
In this article, we observed the bias in rice gene expression microarray data generated with the Affymetrix one-cycle, two-cycle RNA labeling protocols, followed by validation with Real Time PCR. Based on these data, we proposed a statistical framework to model the processes of mRNA two-cycle linear amplification, and established a linear model for probe level correction. Maximum Likelihood Estimation (MLE) was applied to perform robust estimation of the Retaining Rate for each probe. After bias correction, some known pre-processing methods, such as PDNN, could be combined to finish preprocessing. Then, we evaluated our model and the results suggest that our model can effectively increase the quality of the microarray raw data: (i) Decrease the Coefficient of Variation for PM intensities of probe sets; (ii) Distinguish the microarray samples of five stages for rice stamen development more clearly; (iii) Improve the correlation coefficients among stamen microarray samples. We also discussed the necessity of model adjustment by comparing with another simple adjustment method.
We conclude that the adjustment model is necessary and could effectively increase the quality of estimation for gene expression from the microarray raw data.
微阵列技术广泛用于同时监测数千个基因的表达变化。然而,用于标记和杂交的相对大量的 RNA 的需求使得用有限的生物材料进行微阵列实验变得困难,从而导致了许多用于制备和扩增 mRNA 的方法的发展。需要解决的是扩增方法通常会带来偏差,这可能会强烈阻碍后续对结果的解释。一个巨大的挑战是如何在进一步分析之前纠正偏差。
在本文中,我们观察了用 Affymetrix 单循环、双循环 RNA 标记方案生成的水稻基因表达微阵列数据中的偏差,并用实时 PCR 进行了验证。基于这些数据,我们提出了一种统计框架来模拟 mRNA 双循环线性扩增的过程,并建立了探针水平校正的线性模型。最大似然估计(MLE)用于对每个探针的保留率进行稳健估计。在偏差校正后,可以结合一些已知的预处理方法,如 PDNN,来完成预处理。然后,我们评估了我们的模型,结果表明我们的模型可以有效地提高微阵列原始数据的质量:(i)降低探针集 PM 强度的变异系数;(ii)更清晰地区分水稻雄蕊发育的五个阶段的微阵列样品;(iii)提高雄蕊微阵列样品之间的相关系数。我们还通过比较另一种简单的调整方法讨论了模型调整的必要性。
我们得出结论,调整模型是必要的,并且可以有效地提高微阵列原始数据中基因表达的估计质量。