Zhou Heather J, Ge Xinzhou, Li Jingyi Jessica
Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Current address: Department of Statistics, Oregon State University, Corvallis, OR 97330, USA.
bioRxiv. 2023 Aug 29:2023.08.28.555191. doi: 10.1101/2023.08.28.555191.
A central task in expression quantitative trait locus (eQTL) analysis is to identify cis-eGenes (henceforth "eGenes"), i.e., genes whose expression levels are regulated by at least one local genetic variant. Among the existing eGene identification methods, FastQTL is considered the gold standard but is computationally expensive as it requires thousands of permutations for each gene. Alternative methods such as eigenMT and TreeQTL have lower power than FastQTL. In this work, we propose ClipperQTL, which reduces the number of permutations needed from thousands to 20 for data sets with large sample sizes ( 450) by using the contrastive strategy developed in Clipper; for data sets with smaller sample sizes, it uses the same permutation-based approach as FastQTL. We show that ClipperQTL performs as well as FastQTL and runs about 500 times faster if the contrastive strategy is used and 50 times faster if the conventional permutation-based approach is used. The R package ClipperQTL is available at https://github.com/heatherjzhou/ClipperQTL.
表达数量性状基因座(eQTL)分析的一项核心任务是识别顺式作用e基因(以下简称“e基因”),即其表达水平受至少一个局部遗传变异调控的基因。在现有的e基因识别方法中,FastQTL被视为金标准,但计算成本高昂,因为它对每个基因都需要数千次排列检验。诸如eigenMT和TreeQTL等替代方法的功效低于FastQTL。在这项工作中,我们提出了ClipperQTL,对于大样本量(≥450)的数据集,它通过使用Clipper中开发的对比策略,将所需的排列检验次数从数千次减少到20次;对于小样本量的数据集,它使用与FastQTL相同的基于排列检验的方法。我们表明,ClipperQTL的性能与FastQTL相当,如果使用对比策略,运行速度快约500倍;如果使用传统的基于排列检验的方法,运行速度快约50倍。R包ClipperQTL可在https://github.com/heatherjzhou/ClipperQTL获取。