Department of Epidemiology and Public Health, Imperial College London, London, UK.
Nucleic Acids Res. 2010 Jan;38(1):e4. doi: 10.1093/nar/gkp853. Epub 2009 Oct 23.
Affymetrix has recently developed whole-transcript GeneChips-'Gene' and 'Exon' arrays-which interrogate exons along the length of each gene. Although each probe on these arrays is intended to hybridize perfectly to only one transcriptional target, many probes match multiple transcripts located in different parts of the genome or alternative isoforms of the same gene. Existing statistical methods for estimating expression do not take this into account and are thus prone to producing inflated estimates. We propose a method, Multi-Mapping Bayesian Gene eXpression (MMBGX), which disaggregates the signal at 'multi-match' probes. When applied to Gene arrays, MMBGX removes the upward bias of gene-level expression estimates. When applied to Exon arrays, it can further disaggregate the signal between alternative transcripts of the same gene, providing expression estimates of individual splice variants. We demonstrate the performance of MMBGX on simulated data and a tissue mixture data set. We then show that MMBGX can estimate the expression of alternative isoforms within one experimental condition, confirming our results by RT-PCR. Finally, we show that our method for detecting differential splicing has a lower error rate than standard exon-level approaches on a previously validated colon cancer data set.
Affymetrix 最近开发了全转录本 GeneChips-“Gene”和“Exon”阵列-这些阵列沿着每个基因的长度检测外显子。尽管这些阵列上的每个探针都旨在与一个转录目标完美杂交,但许多探针与位于基因组不同部位的多个转录本或同一基因的不同异构体匹配。现有的估计表达的统计方法没有考虑到这一点,因此容易产生过高的估计。我们提出了一种方法,即多映射贝叶斯基因表达(MMBGX),它可以分解“多匹配”探针的信号。当应用于 Gene 阵列时,MMBGX 可以消除基因水平表达估计的向上偏差。当应用于 Exon 阵列时,它可以进一步分解同一基因的不同转录本之间的信号,提供单个剪接变体的表达估计。我们在模拟数据和组织混合物数据集上展示了 MMBGX 的性能。然后,我们证明 MMBGX 可以在一个实验条件下估计替代异构体的表达,通过 RT-PCR 证实了我们的结果。最后,我们表明,我们的检测差异剪接的方法在以前验证的结肠癌数据集上比标准的外显子水平方法具有更低的错误率。