Walter and Eliza Hall Institute of Medical Research, Melbourne, Vic 3052, Australia.
BMC Bioinformatics. 2013 May 26;14:165. doi: 10.1186/1471-2105-14-165.
Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes.
This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all.
The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
双通道(或双色)微阵列是进行基因表达比较分析的具有成本效益的平台。传统上,它们是根据每个点的两个通道强度的对数比(M 值)进行分析的,但这种分析并未利用单独通道观测中可用的所有信息。已经提出了混合模型来分析两个通道的强度作为单独的观测值,但是这种模型可能难以使用,并且与对数比分析相比,效率的提高也难以量化。混合模型产生的用于零假设分布的检验统计量只能近似指定,并且一些方法在基因之间没有借用强度。
本文重新构建了混合模型,以阐明与传统对数比分析的关系,促进基因之间的信息借用,并为所得检验统计量获得精确的分布理论。混合模型转换为在 M 值和 A 值(每个点的平均对数表达)上运行,而不是在对数表达值上运行。对数比分析被证明忽略了 A 值中包含的信息。对数比分析的相对效率取决于点内相关性的大小。提出了一种新的单独通道分析方法,该方法假定所有基因的点内相关系数都相同。这种方法允许混合模型转换为普通线性模型,从而允许数据分析使用针对微阵列数据的线性建模的经过充分理解的经验贝叶斯分析管道。这产生了具有精确分布理论的具有统计学意义的强大检验统计量。对数比、混合模型和常见相关方法使用三个案例研究进行了比较。结果表明,在基因之间借用强度的单独通道分析比对数比分析更强大。常见相关分析是所有分析中最强大的。
本文为双通道微阵列数据的单独通道分析提出的常见相关方法在实践中并不比传统的对数比分析更难应用。它提供了一种直观而强大的方法,可以进行分析和比较,否则可能无法进行。