Harvey Chris T, Moyerbrailean Gregory A, Davis Gordon O, Wen Xiaoquan, Luca Francesca, Pique-Regi Roger
Center for Molecular Medicine and Genetics, Department of Obstetrics and Gynecology, Wayne State University, 540 E Canfield, Scott Hall, Detroit, MI 48201, USA and Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2015 Apr 15;31(8):1235-42. doi: 10.1093/bioinformatics/btu802. Epub 2014 Dec 4.
Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele-specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available, it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls.
We present QuASAR, quantitative allele-specific analysis of reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls, while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high-quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available.
http://github.com/piquelab/QuASAR.
fluca@wayne.edu or rpique@wayne.edu
Supplementary Material is available at Bioinformatics online.
表达数量性状基因座(eQTL)研究已发现数千个调控基因表达的遗传变异,有助于更好地理解非编码序列的功能作用。然而,eQTL研究成本高昂,需要大样本量以及对每个样本进行全基因组基因分型。相比之下,等位基因特异性表达(ASE)分析正成为一种检测遗传变异对基因表达影响的常用方法,甚至可在单个个体内进行。这通常通过计算杂合位点处与每个等位基因匹配的RNA测序读数数量,并检验1:1等位基因比例的零假设来实现。原则上,当基因型信息不易获取时,可以直接从RNA测序读数中推断出来。然而,目前尚无现有方法能在考虑基因型调用不确定性的同时联合推断基因型并进行ASE推断。
我们提出了QuASAR(reads的定量等位基因特异性分析),这是一种用于联合检测杂合基因型和推断ASE的新型统计学习方法。所提出的ASE推断步骤考虑了基因型调用中的不确定性,同时纳入了对测序中碱基调用错误和等位基因过度离散进行建模的参数。我们使用可获得高质量基因型的实验数据验证了我们的方法。对另一个在不同测序深度有多个重复的数据集的结果表明,当基因型不可用时,QuASAR是进行ASE分析的强大工具。
http://github.com/piquelab/QuASAR。
fluca@wayne.edu或rpique@wayne.edu
补充材料可在《生物信息学》在线获取。