Zhao Hong-Ya, Yue Patrick Y K, Fang Kai-Tai
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.
J Biopharm Stat. 2004 Aug;14(3):629-46. doi: 10.1081/BIP-200025654.
DNA microarray offers a powerful and effective technology to monitor the changes in the gene expression levels for thousands of genes simultaneously. It is being widely applied to explore the quantitative alternation in gene regulation in response to a variety of aspects including diseases and exposure of toxicant. A common task in analyzing microarray data is to identify the differentially expressed genes under two different experimental conditions. Because of the large number of genes and small number of arrays, and higher signal-noise ratio in microarray data, many traditional approaches seem improper. In this paper, a multivariate mixture model is applied to model the expression level of replicated arrays, considering the differentially expressed genes as the outliers of the expression data. In order to detect the outliers of the multivariate mixture model, an effective and robust statistical method is first applied to microarray analysis. This method is based on the analysis of kurtosis coefficient (KC) of the projected multivariate data arising from a mixture model so as to identify the outliers. We utilize the multivariate KC algorithm to our microarray experiment with the control and toxic treatment. After the processing of data, the differential genes are successfully identified from 1824 genes on the UCLA M07 microarray chip. We also use the RT-PCR method and two robust statistical methods, minimum covariance determinant (MCD) and minimum volume ellipsoid (MVE), to verify the expression level of outlier genes identified by KC algorithm. We conclude that the robust multivariate tool is practical and effective for the detection of differentially expressed genes.
DNA微阵列提供了一种强大而有效的技术,可同时监测数千个基因的基因表达水平变化。它正被广泛应用于探索基因调控在包括疾病和接触毒物等各种情况下的定量变化。分析微阵列数据的一个常见任务是识别两种不同实验条件下的差异表达基因。由于基因数量众多而阵列数量较少,且微阵列数据中的信噪比更高,许多传统方法似乎并不适用。在本文中,应用多元混合模型来模拟重复阵列的表达水平,将差异表达基因视为表达数据的异常值。为了检测多元混合模型的异常值,首先将一种有效且稳健的统计方法应用于微阵列分析。该方法基于对混合模型产生的投影多元数据的峰度系数(KC)的分析,以识别异常值。我们将多元KC算法应用于我们的微阵列实验,包括对照和毒物处理。经过数据处理后,成功从加州大学洛杉矶分校M07微阵列芯片上的1824个基因中识别出差异基因。我们还使用逆转录聚合酶链反应(RT-PCR)方法以及两种稳健的统计方法,即最小协方差行列式(MCD)和最小体积椭球体(MVE),来验证由KC算法识别出的异常值基因的表达水平。我们得出结论,稳健的多元工具对于检测差异表达基因是实用且有效的。