Stalteri Maria A, Harrison Andrew P
Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK.
BMC Bioinformatics. 2007 Jan 15;8:13. doi: 10.1186/1471-2105-8-13.
Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case.
We have made a case study of the mouse Surf4, chosen because it is a gene that was reported to be represented by the same eight probe sets on the MOE430A array by both Affymetrix and Bioconductor in early 2004. Only five of the probe sets actually detect Surf4 transcripts. Two of the probe sets detect splice variants of Surf2. We have also studied the expression changes of the eight probe sets in a public-domain microarray experiment. The transcripts for Surf4 are correlated in time, and similarly the transcripts for Surf2 are also correlated in time. However, the transcripts for Surf4 and Surf2 are not correlated. This proof of principle shows that observations of expression can be used to confirm, or otherwise, annotation discrepancies. We have also investigated groups of probe sets on the RAE230A array that are assigned to the same LocusID, but which show large variances in differential expression in any one of three different experiments on rat. The probe set groups with high variances are found to represent cases of alternative splicing, use of alternative poly(A) signals, or incorrect annotations.
Our results indicate that some probe sets should not be considered as unique measures of transcription, because the individual probes map to more than one transcript dependent upon the biological condition. Our results highlight the need for care when assessing whether groups of probe sets all measure the same transcript.
Affymetrix基因芯片技术能够对成千上万的基因进行并行观测。探针集注释可靠至关重要,这样才能对经历差异表达的基因进行生物学推断。代表同一基因的探针集可能预期会显示相似的倍数变化/z分数,然而实际情况并非如此。
我们以小鼠Surf4为例进行了案例研究,选择它是因为在2004年初,Affymetrix和Bioconductor都报告称该基因在MOE430A阵列上由相同的八个探针集代表。实际上只有五个探针集能检测到Surf4转录本。其中两个探针集检测到的是Surf2的剪接变体。我们还在一个公共领域的微阵列实验中研究了这八个探针集的表达变化。Surf4的转录本在时间上具有相关性,同样,Surf2的转录本在时间上也具有相关性。然而,Surf4和Surf2的转录本不相关。这个原理证明表明,表达观测可用于确认或发现注释差异。我们还研究了RAE230A阵列上被分配到相同基因座标识但在大鼠的三个不同实验中的任何一个中差异表达显示出很大差异的探针集组。发现具有高差异的探针集组代表了可变剪接、使用可变聚腺苷酸化信号或注释错误的情况。
我们的结果表明,一些探针集不应被视为转录的唯一测量指标,因为根据生物学条件,单个探针可映射到多个转录本。我们的结果强调了在评估一组探针集是否都测量同一转录本时需要谨慎。