Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia.
Bioinformatics. 2012 Apr 15;28(8):1102-8. doi: 10.1093/bioinformatics/bts089. Epub 2012 Feb 21.
In the past decade, a number of technologies to quantify allele-specific expression (ASE) in a genome-wide manner have become available to researchers. We investigate the application of single-nucleotide polymorphism (SNP) microarrays to this task, exploring data obtained from both cell lines and primary tissue for which both RNA and DNA profiles are available.
We analyze data from two experiments that make use of high-density Illumina Infinium II genotyping arrays to measure ASE. We first preprocess each data set, which involves removal of outlier samples, careful normalization and a two-step filtering procedure to remove SNPs that show no evidence of expression in the samples being analyzed and calls that are clear genotyping errors. We then compare three different tests for detecting ASE, one of which has been previously published and two novel approaches. These tests vary at the level at which they operate (per SNP per individual or per SNP) and in the input data they require. Using SNPs from imprinted genes as true positives for ASE, we observe varying sensitivity for the different testing procedures that improves with increasing sample size. Methods that rely on RNA signal alone were found to perform best across a range of metrics. The top ranked SNPs recovered by all methods appear to be reasonable candidates for ASE.
Analysis was carried out in R (http://www.R-project.org/) using existing functions.
在过去的十年中,许多用于以全基因组方式定量等位基因特异性表达(ASE)的技术已为研究人员所采用。我们研究了单核苷酸多态性(SNP)微阵列在这一任务中的应用,探索了来自细胞系和初级组织的两种数据,这些数据都提供了 RNA 和 DNA 图谱。
我们分析了两个利用高密度 Illumina Infinium II 基因分型阵列来测量 ASE 的实验的数据。我们首先对每个数据集进行预处理,包括去除异常样本、仔细的归一化以及两步过滤程序,以去除在分析样本中没有表达证据的 SNP 和明显的基因分型错误的 SNP。然后,我们比较了三种不同的 ASE 检测方法,其中一种先前已经发表,另外两种是新的方法。这些方法在操作水平(每个个体的 SNP 或 SNP)和所需的输入数据方面有所不同。使用印迹基因中的 SNP 作为 ASE 的真实阳性,我们观察到不同的检测方法的灵敏度不同,随着样本量的增加而提高。仅依赖 RNA 信号的方法在一系列指标上表现最佳。所有方法恢复的排名最高的 SNP 似乎是 ASE 的合理候选者。