Acharya Chaitanya R, McCarthy Janice M, Owzar Kouros, Allen Andrew S
Program in Computational Biology and Bioinformatics, Duke University, 101 Science Dr, Durham, 27708, USA.
Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Rd, Durham, 27708, USA.
BMC Bioinformatics. 2016 Jun 24;17:257. doi: 10.1186/s12859-016-1123-5.
In order to better understand complex diseases, it is important to understand how genetic variation in the regulatory regions affects gene expression. Genetic variants found in these regulatory regions have been shown to activate transcription in a tissue-specific manner. Therefore, it is important to map the aforementioned expression quantitative trait loci (eQTL) using a statistically disciplined approach that jointly models all the tissues and makes use of all the information available to maximize the power of eQTL mapping. In this context, we are proposing a score test-based approach where we model tissue-specificity as a random effect and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants.
Our approach has 1) a distinct computational edge, and 2) comparable performance in terms of statistical power over other currently existing joint modeling approaches such as MetaTissue eQTL and eQTL-BMA. Using simulations, we show that our method increases the power to detect eQTLs when compared to a tissue-by-tissue approach and can exceed the performance, in terms of computational speed, of MetaTissue eQTL and eQTL-BMA. We apply our method to two publicly available expression datasets from normal human brains, one comprised of four brain regions from 150 neuropathologically normal samples and another comprised of ten brain regions from 134 neuropathologically normal samples, and show that by using our method and jointly analyzing multiple brain regions, we identify eQTLs within more genes when compared to three often used existing methods.
Since we employ a score test-based approach, there is no need for parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Our method also accommodates the analysis of next- generation sequencing data. As an example, by modeling gene transcripts in an analogous fashion to tissues in our current formulation one would be able to test for both a variant overall effect across all isoforms of a gene as well as transcript-specific effects. We implement our approach within the R package JAGUAR, which is now available at the Comprehensive R Archive Network repository.
为了更好地理解复杂疾病,了解调控区域的基因变异如何影响基因表达至关重要。已证明在这些调控区域发现的基因变异以组织特异性方式激活转录。因此,使用一种统计严谨的方法来绘制上述表达定量性状基因座(eQTL)很重要,该方法联合对所有组织进行建模并利用所有可用信息,以最大化eQTL定位的功效。在此背景下,我们提出一种基于得分检验的方法,我们将组织特异性建模为随机效应,并研究由于基因变异导致的基因表达的总体变化以及组织特异性效应。
我们的方法具有1)明显的计算优势,以及2)在统计功效方面与其他现有联合建模方法(如MetaTissue eQTL和eQTL - BMA)相当的性能。通过模拟,我们表明与逐个组织的方法相比,我们的方法提高了检测eQTL的功效,并且在计算速度方面可以超过MetaTissue eQTL和eQTL - BMA的性能。我们将我们的方法应用于来自正常人类大脑的两个公开可用的表达数据集,一个由150个神经病理学正常样本的四个脑区组成,另一个由134个神经病理学正常样本的十个脑区组成,并表明通过使用我们的方法并联合分析多个脑区,与三种常用的现有方法相比,我们在更多基因中鉴定出了eQTL。
由于我们采用基于得分检验的方法,在备择假设下无需进行参数估计。因此,模型参数只需在每个基因组中估计一次,显著减少了计算时间。我们的方法还适用于下一代测序数据的分析。例如,通过以类似于我们当前公式中组织的方式对基因转录本进行建模,人们将能够测试基因所有异构体的变异总体效应以及转录本特异性效应。我们在R包JAGUAR中实现了我们的方法,该包现在可在综合R存档网络存储库中获取。