Panousis Nikolaos I, Gutierrez-Arcelus Maria, Dermitzakis Emmanouil T, Lappalainen Tuuli
Genome Biol. 2014 Sep 20;15(9):467. doi: 10.1186/s13059-014-0467-2.
RNA sequencing (RNA-seq) is the current gold-standard method to quantify gene expression for expression quantitative trait locus (eQTL) studies. However, a potential caveat in these studies is that RNA-seq reads carrying the non-reference allele of variant loci can have lower probability to map correctly to the reference genome, which could bias gene quantifications and cause false positive eQTL associations. In this study, we analyze the effect of this allelic mapping bias in eQTL discovery.
We simulate RNA-seq read mapping over 9.5 M common SNPs and indels, with 15.6% of variants showing biased mapping rate for reference versus non-reference reads. However, removing potentially biased RNA-seq reads from an eQTL dataset of 185 individuals has a very small effect on gene and exon quantifications and eQTL discovery. We detect only a handful of likely false positive eQTLs, and overall eQTL SNPs show no significant enrichment for high mapping bias.
Our results suggest that RNA-seq quantifications are generally robust against allelic mapping bias, and that this does not have a severe effect on eQTL discovery. Nevertheless, we provide our catalog of putatively biased loci to allow better controlling for mapping bias to obtain more accurate results in future RNA-seq studies.
RNA测序(RNA-seq)是目前用于表达数量性状基因座(eQTL)研究中基因表达定量的金标准方法。然而,这些研究中一个潜在的问题是,携带变异位点非参考等位基因的RNA-seq reads正确映射到参考基因组的概率可能较低,这可能会使基因定量产生偏差并导致假阳性eQTL关联。在本研究中,我们分析了这种等位基因映射偏差在eQTL发现中的影响。
我们模拟了超过950万个常见单核苷酸多态性(SNP)和插入缺失(indel)的RNA-seq reads映射,其中15.6%的变异在参考reads与非参考reads之间显示出有偏差的映射率。然而,从185个个体的eQTL数据集中去除潜在有偏差的RNA-seq reads对基因和外显子定量以及eQTL发现的影响非常小。我们仅检测到少数可能的假阳性eQTL,并且总体eQTL SNPs在高映射偏差方面没有显著富集。
我们的结果表明,RNA-seq定量通常对等位基因映射偏差具有鲁棒性,并且这对eQTL发现没有严重影响。尽管如此,我们提供了可能有偏差位点的目录,以便在未来的RNA-seq研究中更好地控制映射偏差以获得更准确的结果。