Okoniewski Michał J, Miller Crispin J
Paterson Institute For Cancer Research, Christie Hospital site, University of Manchester, Wilmslow Road, Manchester, M20 4BX, UK.
BMC Bioinformatics. 2006 Jun 2;7:276. doi: 10.1186/1471-2105-7-276.
Microarrays measure the binding of nucleotide sequences to a set of sequence specific probes. This information is combined with annotation specifying the relationship between probes and targets and used to make inferences about transcript- and, ultimately, gene expression. In some situations, a probe is capable of hybridizing to more than one transcript, in others, multiple probes can target a single sequence. These 'multiply targeted' probes can result in non-independence between measured expression levels.
An analysis of these relationships for Affymetrix arrays considered both the extent and influence of exact matches between probe and transcript sequences. For the popular HGU133A array, approximately half of the probesets were found to interact in this way. Both real and simulated expression datasets were used to examine how these effects influenced the expression signal. It was found not only to lead to increased signal strength for the affected probesets, but the major effect is to significantly increase their correlation, even in situations when only a single probe from a probeset was involved. By building a network of probe-probeset-transcript relationships, it is possible to identify families of interacting probesets. More than 10% of the families contain members annotated to different genes or even different Unigene clusters. Within a family, a mixture of genuine biological and artefactual correlations can occur.
Multiple targeting is not only prevalent, but also significant. The ability of probesets to hybridize to more than one gene product can lead to false positives when analysing gene expression. Comprehensive annotation describing multiple targeting is required when interpreting array data.
微阵列可测量核苷酸序列与一组序列特异性探针的结合情况。这些信息与指定探针和靶标之间关系的注释相结合,并用于推断转录本表达,最终推断基因表达。在某些情况下,一个探针能够与多个转录本杂交,在其他情况下,多个探针可以靶向单个序列。这些“多重靶向”探针可能导致测量的表达水平之间不独立。
对Affymetrix阵列的这些关系进行分析时,考虑了探针和转录本序列之间完全匹配的程度和影响。对于常用的HGU133A阵列,发现约一半的探针集以这种方式相互作用。真实和模拟的表达数据集均用于研究这些效应如何影响表达信号。结果发现,这不仅会导致受影响探针集的信号强度增加,而且主要影响是显著增加它们之间的相关性,即使在仅涉及探针集中单个探针的情况下也是如此。通过构建探针-探针集-转录本关系网络,可以识别相互作用的探针集家族。超过10%的家族包含注释到不同基因甚至不同单基因簇的成员。在一个家族中,可能同时存在真实的生物学相关性和人为的相关性。
多重靶向不仅普遍存在,而且影响显著。在分析基因表达时,探针集与多种基因产物杂交的能力可能导致假阳性结果。在解释阵列数据时,需要全面描述多重靶向的注释。