Kapur Karen, Jiang Hui, Xing Yi, Wong Wing Hung
Department of Statistics, Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA.
Bioinformatics. 2008 Dec 15;24(24):2887-93. doi: 10.1093/bioinformatics/btn571. Epub 2008 Nov 4.
Microarray designs have become increasingly probe-rich, enabling targeting of specific features, such as individual exons or single nucleotide polymorphisms. These arrays have the potential to achieve quantitative high-throughput estimates of transcript abundances, but currently these estimates are affected by biases due to cross-hybridization, in which probes hybridize to off-target transcripts.
To study cross-hybridization, we map Affymetrix exon array probes to a set of annotated mRNA transcripts, allowing a small number of mismatches or insertion/deletions between the two sequences. Based on a systematic study of the degree to which probes with a given match type to a transcript are affected by cross-hybridization, we developed a strategy to correct for cross-hybridization biases of gene-level expression estimates. Comparison with Solexa ultra high-throughput sequencing data demonstrates that correction for cross-hybridization leads to a significant improvement of gene expression estimates.
We provide mappings between human and mouse exon array probes and off-target transcripts and provide software extending the GeneBASE program for generating gene-level expression estimates including the cross-hybridization correction http://biogibbs.stanford.edu/~kkapur/GeneBase/.
微阵列设计的探针越来越丰富,能够针对特定特征,如单个外显子或单核苷酸多态性。这些阵列有潜力实现转录本丰度的定量高通量估计,但目前这些估计受到交叉杂交偏差的影响,即探针与非靶标转录本杂交。
为了研究交叉杂交,我们将Affymetrix外显子阵列探针映射到一组注释的mRNA转录本上,允许两个序列之间存在少量错配或插入/缺失。基于对与转录本具有给定匹配类型的探针受交叉杂交影响程度的系统研究,我们开发了一种策略来校正基因水平表达估计中的交叉杂交偏差。与Solexa超高通量测序数据的比较表明,交叉杂交校正显著改善了基因表达估计。
我们提供了人类和小鼠外显子阵列探针与非靶标转录本之间的映射,并提供了扩展GeneBASE程序的软件,用于生成包括交叉杂交校正的基因水平表达估计(http://biogibbs.stanford.edu/~kkapur/GeneBase/)。