Rantalainen Mattias, Lindgren Cecilia M, Holmes Christopher C
Department of Statistics, University of Oxford, Oxford, United Kingdom; Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
PLoS One. 2015 May 18;10(5):e0127882. doi: 10.1371/journal.pone.0127882. eCollection 2015.
Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.
表达定量性状位点(eQTL)分析能够对影响单个基因表达水平的功能性遗传变异进行表征。在包括人类在内的异交群体中,eQTL通常使用传统线性模型进行分析,针对相关协变量进行调整,假设存在等位基因剂量模型和高斯误差项。然而,基因表达数据通常存在噪声,相对于高斯分布会导致重尾误差,并且常常包含非典型观测值或异常值。这种与建模假设的偏离可能导致II型错误(假阴性)率增加,并且在一定程度上也会导致I型错误(假阳性)率增加。仔细的模型检查可以降低I型错误的风险,但通常无法降低II型错误的风险,因为在大规模全基因组研究中仔细检查所有无显著效应的模型通常过于耗时。在此,我们提出应用稳健线性模型进行eQTL分析,以减少与高斯残差假设偏离所带来的不利影响。我们展示了模拟研究的结果以及对实际eQTL数据集分析的结果。我们的研究结果表明,在许多情况下,与传统线性模型相比,稳健模型有可能提供更可靠的eQTL结果,特别是在减少由于非高斯噪声导致的II型错误方面。后基因组数据,例如在全基因组eQTL研究中生成的数据,通常存在噪声且经常包含非典型观测值。稳健统计模型在非高斯条件下有可能提供更可靠的结果并提高统计功效。此处展示的结果表明,在进行eQTL分析时,应将稳健模型与其他常用方法一起常规考虑。