Zhao Hongya, Chan Kwok-Leung, Cheng Lee-Ming, Yan Hong
Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.
BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-9-S1-S9.
Identification of differentially expressed genes is a typical objective when analyzing gene expression data. Recently, Bayesian hierarchical models have become increasingly popular to solve this type of problems. These models show good performance in accommodating noise, variability and low replication of microarray data. However, the correlation between different fluorescent signals measured from a gene spot is ignored, which can diversely affect the data analysis step. In fact, the intensities of the two signals are significantly correlated across samples. The larger the log-transformed intensities are, the smaller the correlation is.
Motivated by the complicated error relations in microarray data, we propose a multivariate hierarchical Bayesian framework for data analysis in the replicated microarray experiments. Gene expression data are modelled by a multivariate normal distribution, parameterized by the corresponding mean vectors and covariance matrixes with a conjugate prior distribution. Within the Bayesian framework, a generalized likelihood ratio test (GLRT) is also developed to infer the gene expression patterns. Simulation studies show that the proposed approach presents better operating characteristics and lower false discovery rate (FDR) than existing methods, especially when the correlation coefficient is large. The approach is illustrated with two examples of microarray analysis. The proposed method successfully detects significant genes closely related to the experimental states, which are verified by the biological information.
The multivariate Bayesian model, compatible with the dependence between mean and variance in the univariate Bayesian model, relaxes the constant coefficient of variation assumption between measurements by adding a covariance structure. This model improves the identification of differentially expressed genes significantly since the Bayesian model fit well with the microarray data.
在分析基因表达数据时,识别差异表达基因是一个典型目标。最近,贝叶斯分层模型在解决这类问题上越来越受欢迎。这些模型在处理微阵列数据的噪声、变异性和低重复率方面表现良好。然而,从基因点测量的不同荧光信号之间的相关性被忽略了,这可能会对数据分析步骤产生不同影响。事实上,两个信号的强度在样本间显著相关。对数转换后的强度越大,相关性越小。
受微阵列数据中复杂误差关系的启发,我们提出了一种用于重复微阵列实验数据分析的多元分层贝叶斯框架。基因表达数据由多元正态分布建模,通过相应的均值向量和协方差矩阵以及共轭先验分布进行参数化。在贝叶斯框架内,还开发了一种广义似然比检验(GLRT)来推断基因表达模式。模拟研究表明,与现有方法相比,所提出的方法具有更好的操作特性和更低的错误发现率(FDR),尤其是在相关系数较大时。通过两个微阵列分析示例对该方法进行了说明。所提出的方法成功检测到了与实验状态密切相关的显著基因,这些基因已通过生物学信息得到验证。
多元贝叶斯模型与单变量贝叶斯模型中均值和方差之间的依赖性兼容,通过添加协方差结构放宽了测量之间恒定变异系数的假设。由于贝叶斯模型与微阵列数据拟合良好,该模型显著提高了差异表达基因的识别能力。