Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan.
DNA Res. 2009 Oct;16(5):249-60. doi: 10.1093/dnares/dsp016. Epub 2009 Sep 18.
Information regarding gene coexpression is useful to predict gene function. Several databases have been constructed for gene coexpression in model organisms based on a large amount of publicly available gene expression data measured by GeneChip platforms. In these databases, Pearson's correlation coefficients (PCCs) of gene expression patterns are widely used as a measure of gene coexpression. Although the coexpression measure or GeneChip summarization method affects the performance of the gene coexpression database, previous studies for these calculation procedures were tested with only a small number of samples and a particular species. To evaluate the effectiveness of coexpression measures, assessments with large-scale microarray data are required. We first examined characteristics of PCC and found that the optimal PCC threshold to retrieve functionally related genes was affected by the method of gene expression database construction and the target gene function. In addition, we found that this problem could be overcome when we used correlation ranks instead of correlation values. This observation was evaluated by large-scale gene expression data for four species: Arabidopsis, human, mouse and rat.
关于基因共表达的信息对于预测基因功能很有用。已经构建了几个基于大量公开的基因表达数据的模型生物的基因共表达数据库,这些数据是通过 GeneChip 平台测量的。在这些数据库中,基因表达模式的 Pearson 相关系数 (PCC) 被广泛用作基因共表达的度量。虽然共表达度量或 GeneChip 汇总方法会影响基因共表达数据库的性能,但之前对这些计算过程的研究仅在少数样本和特定物种上进行了测试。为了评估共表达度量的有效性,需要使用大规模的微阵列数据进行评估。我们首先检查了 PCC 的特征,发现检索功能相关基因的最佳 PCC 阈值受到基因表达数据库构建方法和目标基因功能的影响。此外,我们发现当我们使用相关等级而不是相关值时,可以克服这个问题。通过对四个物种(拟南芥、人类、小鼠和大鼠)的大规模基因表达数据进行了评估。