Wu Zhijin, Irizarry Rafael A
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
J Comput Biol. 2005 Jul-Aug;12(6):882-93. doi: 10.1089/cmb.2005.12.882.
High density oligonucleotide expression arrays are a widely used tool for the measurement of gene expression on a large scale. Affymetrix GeneChip arrays appear to dominate this market. These arrays use short oligonucleotides to probe for genes in an RNA sample. Due to optical noise, nonspecific hybridization, probe-specific effects, and measurement error, ad hoc measures of expression that summarize probe intensities can lead to imprecise and inaccurate results. Various researchers have demonstrated that expression measures based on simple statistical models can provide great improvements over the ad hoc procedure offered by Affymetrix. Recently, physical models based on molecular hybridization theory have been proposed as useful tools for prediction of, for example, nonspecific hybridization. These physical models show great potential in terms of improving existing expression measures. In this paper, we suggest that the system producing the measured intensities is too complex to be fully described with these relatively simple physical models, and we propose empirically motivated stochastic models that complement the above-mentioned molecular hybridization theory to provide a comprehensive description of the data. We discuss how the proposed model can be used to obtain improved measures of expression useful for the data analysts.
高密度寡核苷酸表达阵列是一种广泛用于大规模测量基因表达的工具。Affymetrix基因芯片阵列似乎在这个市场中占据主导地位。这些阵列使用短寡核苷酸来探测RNA样本中的基因。由于光学噪声、非特异性杂交、探针特异性效应和测量误差,总结探针强度的临时表达测量方法可能会导致不精确和不准确的结果。许多研究人员已经证明,基于简单统计模型的表达测量方法比Affymetrix提供的临时程序有很大改进。最近,基于分子杂交理论的物理模型已被提出作为预测例如非特异性杂交的有用工具。这些物理模型在改进现有表达测量方法方面显示出巨大潜力。在本文中,我们认为产生测量强度的系统过于复杂,无法用这些相对简单的物理模型完全描述,因此我们提出了基于经验的随机模型,以补充上述分子杂交理论,从而对数据进行全面描述。我们讨论了如何使用所提出的模型来获得对数据分析人员有用的改进表达测量方法。