Acharya Chaitanya R, Owzar Kouros, Allen Andrew S
Program in Computational Biology and Bioinformatics, Duke University, 2424 Erwin Road, Suite 1104, Durham, 27710, NC, USA.
Department of Biostatistics and Bioinformatics, Duke University, 2424 Erwin Road, Suite 1104, Durham, 27710, NC, USA.
BMC Bioinformatics. 2017 Oct 18;18(1):455. doi: 10.1186/s12859-017-1856-9.
DNA methylation is an important tissue-specific epigenetic event that influences transcriptional regulation of gene expression. Differentially methylated CpG sites may act as mediators between genetic variation and gene expression, and this relationship can be exploited while mapping multi-tissue expression quantitative trait loci (eQTL). Current multi-tissue eQTL mapping techniques are limited to only exploiting gene expression patterns across multiple tissues either in a joint tissue or tissue-by-tissue frameworks. We present a new statistical approach that enables us to model the effect of germ-line variation on tissue-specific gene expression in the presence of effects due to DNA methylation.
Our method efficiently models genetic and epigenetic variation to identify genomic regions of interest containing combinations of mRNA transcripts, CpG sites, and SNPs by jointly testing for genotypic effect and higher order interaction effects between genotype, methylation and tissues. We demonstrate using Monte Carlo simulations that our approach, in the presence of both genetic and DNA methylation effects, gives an improved performance (in terms of statistical power) to detect eQTLs over the current eQTL mapping approaches. When applied to an array-based dataset from 150 neuropathologically normal adult human brains, our method identifies eQTLs that were undetected using standard tissue-by-tissue or joint tissue eQTL mapping techniques. As an example, our method identifies eQTLs by leveraging methylated CpG sites in a LIM homeobox member gene (LHX9), which may have a role in the neural development.
Our score test-based approach does not need parameter estimation under the alternative hypothesis. As a result, our model parameters are estimated only once for each mRNA - CpG pair. Our model specifically studies the effects of non-coding regions of DNA (in this case, CpG sites) on mapping eQTLs. However, we can easily model micro-RNAs instead of CpG sites to study the effects of post-transcriptional events in mapping eQTL. Our model's flexible framework also allows us to investigate other genomic events such as alternative gene splicing by extending our model to include gene isoform-specific data.
DNA甲基化是一种重要的组织特异性表观遗传事件,会影响基因表达的转录调控。差异甲基化的CpG位点可能充当遗传变异与基因表达之间的介质,并且在绘制多组织表达定量性状基因座(eQTL)时可以利用这种关系。当前的多组织eQTL映射技术仅限于在联合组织或逐个组织的框架中利用多个组织中的基因表达模式。我们提出了一种新的统计方法,该方法使我们能够在存在DNA甲基化效应的情况下,对种系变异对组织特异性基因表达的影响进行建模。
我们的方法通过联合测试基因型效应以及基因型、甲基化和组织之间的高阶相互作用效应,有效地对遗传和表观遗传变异进行建模,以识别包含mRNA转录本、CpG位点和单核苷酸多态性(SNP)组合的感兴趣的基因组区域。我们使用蒙特卡洛模拟证明,在存在遗传和DNA甲基化效应的情况下,我们的方法在检测eQTL方面比当前的eQTL映射方法具有更高的性能(在统计功效方面)。当应用于来自150个神经病理学正常的成年人类大脑的基于阵列的数据集时,我们的方法识别出了使用标准的逐个组织或联合组织eQTL映射技术未检测到的eQTL。例如,我们的方法通过利用LIM同源框成员基因(LHX9)中的甲基化CpG位点来识别eQTL,该基因可能在神经发育中起作用。
我们基于得分检验的方法在备择假设下不需要参数估计。因此,对于每个mRNA-CpG对,我们的模型参数仅估计一次。我们的模型专门研究DNA非编码区域(在这种情况下为CpG位点)对绘制eQTL的影响。但是,我们可以轻松地将微小RNA而不是CpG位点建模,以研究转录后事件在绘制eQTL中的作用。我们模型的灵活框架还允许我们通过扩展模型以纳入基因异构体特异性数据来研究其他基因组事件,例如可变基因剪接。