Hu Yi-Juan, Sun Wei, Tzeng Jung-Ying, Perou Charles M
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322.
Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599.
J Am Stat Assoc. 2015;110(511):962-974. doi: 10.1080/01621459.2015.1038449. Epub 2015 Nov 7.
Studies of expression quantitative trait loci (eQTLs) offer insight into the molecular mechanisms of loci that were found to be associated with complex diseases and the mechanisms can be classified into - and -acting regulation. At present, high-throughput RNA sequencing (RNA-seq) is rapidly replacing expression microarrays to assess gene expression abundance. Unlike microarrays that only measure the total expression of each gene, RNA-seq also provides information on allele-specific expression (ASE), which can be used to distinguish -eQTLs from -eQTLs and, more importantly, enhance -eQTL mapping. However, assessing the -effect of a candidate eQTL on a gene requires knowledge of the haplotypes connecting the candidate eQTL and the gene, which cannot be inferred with certainty. The existing two-stage approach that first phases the candidate eQTL against the gene and then treats the inferred phase as observed in the association analysis tends to attenuate the estimated -effect and reduce the power for detecting a -eQTL. In this article, we provide a maximum-likelihood framework for -eQTL mapping with RNA-seq data. Our approach integrates the inference of haplotypes and the association analysis into a single stage, and is thus unbiased and statistically powerful. We also develop a pipeline for performing a comprehensive scan of all local eQTLs for all genes in the genome by controlling for false discovery rate, and implement the methods in a computationally efficient software program. The advantages of the proposed methods over the existing ones are demonstrated through realistic simulation studies and an application to empirical breast cancer data from The Cancer Genome Atlas project.
表达数量性状基因座(eQTL)研究有助于深入了解与复杂疾病相关的基因座的分子机制,这些机制可分为顺式作用调控和反式作用调控。目前,高通量RNA测序(RNA-seq)正在迅速取代表达微阵列来评估基因表达丰度。与仅测量每个基因总表达量的微阵列不同,RNA-seq还提供等位基因特异性表达(ASE)信息,可用于区分顺式eQTL和反式eQTL,更重要的是,增强反式eQTL定位。然而,评估候选eQTL对基因的反式作用需要了解连接候选eQTL和该基因的单倍型,而这无法确定推断。现有的两阶段方法首先将候选eQTL与基因进行定相,然后在关联分析中将推断的定相当作观察到的情况,这往往会减弱估计的反式作用并降低检测反式eQTL的能力。在本文中,我们为利用RNA-seq数据进行反式eQTL定位提供了一个最大似然框架。我们的方法将单倍型推断和关联分析整合到一个阶段,因此无偏且具有统计效力。我们还开发了一个流程,通过控制错误发现率对基因组中所有基因的所有局部eQTL进行全面扫描,并在一个计算高效的软件程序中实现这些方法。通过实际模拟研究以及对来自癌症基因组图谱项目的乳腺癌经验数据的应用,证明了所提出方法相对于现有方法的优势。