Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Bioinformatics. 2012 May 15;28(10):1353-8. doi: 10.1093/bioinformatics/bts163. Epub 2012 Apr 6.
Expression quantitative trait loci (eQTL) analysis links variations in gene expression levels to genotypes. For modern datasets, eQTL analysis is a computationally intensive task as it involves testing for association of billions of transcript-SNP (single-nucleotide polymorphism) pair. The heavy computational burden makes eQTL analysis less popular and sometimes forces analysts to restrict their attention to just a small subset of transcript-SNP pairs. As more transcripts and SNPs get interrogated over a growing number of samples, the demand for faster tools for eQTL analysis grows stronger.
We have developed a new software for computationally efficient eQTL analysis called Matrix eQTL. In tests on large datasets, it was 2-3 orders of magnitude faster than existing popular tools for QTL/eQTL analysis, while finding the same eQTLs. The fast performance is achieved by special preprocessing and expressing the most computationally intensive part of the algorithm in terms of large matrix operations. Matrix eQTL supports additive linear and ANOVA models with covariates, including models with correlated and heteroskedastic errors. The issue of multiple testing is addressed by calculating false discovery rate; this can be done separately for cis- and trans-eQTLs.
表达数量性状基因座 (eQTL) 分析将基因表达水平的变化与基因型联系起来。对于现代数据集,eQTL 分析是一项计算密集型任务,因为它涉及到对数十亿个转录物-SNP(单核苷酸多态性)对的关联进行测试。沉重的计算负担使得 eQTL 分析不太受欢迎,有时迫使分析师将注意力仅集中在一小部分转录物-SNP 对上。随着越来越多的样本对越来越多的转录物和 SNP 进行检测,对 eQTL 分析更快工具的需求变得越来越强烈。
我们开发了一种新的软件,用于计算效率高的 eQTL 分析,称为 Matrix eQTL。在对大型数据集的测试中,它比现有的流行的 QTL/eQTL 分析工具快 2-3 个数量级,同时找到了相同的 eQTL。快速的性能是通过特殊的预处理和以大型矩阵操作为条件来表达算法中最计算密集的部分来实现的。Matrix eQTL 支持带有协变量的加性线性和 ANOVA 模型,包括具有相关和异方差误差的模型。通过计算错误发现率来解决多重测试问题;可以分别针对顺式和反式 eQTL 进行计算。