Reverter Antonio, Barris Wes, McWilliam Sean, Byrne Keren A, Wang Yong H, Tan Siok H, Hudson Nick, Dalrymple Brian P
Bioinformatics Group, CSIRO Livestock Industries, Queensland Bioscience Precinct, St Lucia, QLD 4067, Australia.
Bioinformatics. 2005 Apr 1;21(7):1112-20. doi: 10.1093/bioinformatics/bti124. Epub 2004 Nov 25.
Clusters of genes encoding proteins with related functions, or in the same regulatory network, often exhibit expression patterns that are correlated over a large number of conditions. Protein associations and gene regulatory networks can be modelled from expression data. We address the question of which of several normalization methods is optimal prior to computing the correlation of the expression profiles between every pair of genes.
We use gene expression data from five experiments with a total of 78 hybridizations and 23 diverse conditions. Nine methods of data normalization are explored based on all possible combinations of normalization techniques according to between and within gene and experiment variation. We compare the resulting empirical distribution of gene x gene correlations with the expectations and apply cross-validation to test the performance of each method in predicting accurate functional annotation. We conclude that normalization methods based on mixed-model equations are optimal.
编码具有相关功能的蛋白质的基因簇,或处于同一调控网络中的基因簇,在大量条件下通常表现出相互关联的表达模式。蛋白质关联和基因调控网络可以从表达数据中建模。在计算每对基因之间的表达谱相关性之前,我们探讨了几种归一化方法中哪种是最优的。
我们使用了来自五个实验的基因表达数据,共有78次杂交和23种不同条件。根据基因间、实验内变异的归一化技术的所有可能组合,探索了九种数据归一化方法。我们将所得的基因×基因相关性的经验分布与预期进行比较,并应用交叉验证来测试每种方法在预测准确功能注释方面的性能。我们得出结论,基于混合模型方程的归一化方法是最优的。