Heilbrunn Department of Population & Family Health, Columbia University, New York, NY, USA.
Department of Biostatistics, Columbia University, New York, NY, USA.
Bioinformatics. 2017 Jul 15;33(14):2123-2130. doi: 10.1093/bioinformatics/btx119.
Over the past decade, there has been a remarkable improvement in our understanding of the role of genetic variation in complex human diseases, especially via genome-wide association studies. However, the underlying molecular mechanisms are still poorly characterized, impending the development of therapeutic interventions. Identifying genetic variants that influence the expression level of a gene, i.e. expression quantitative trait loci (eQTLs), can help us understand how genetic variants influence traits at the molecular level. While most eQTL studies focus on identifying mean effects on gene expression using linear regression, evidence suggests that genetic variation can impact the entire distribution of the expression level. Motivated by the potential higher order associations, several studies investigated variance eQTLs.
In this paper, we develop a Quantile Rank-score based test (QRank), which provides an easy way to identify eQTLs that are associated with the conditional quantile functions of gene expression. We have applied the proposed QRank to the Genotype-Tissue Expression project, an international tissue bank for studying the relationship between genetic variation and gene expression in human tissues, and found that the proposed QRank complements the existing methods, and identifies new eQTLs with heterogeneous effects across different quantile levels. Notably, we show that the eQTLs identified by QRank but missed by linear regression are associated with greater enrichment in genome-wide significant SNPs from the GWAS catalog, and are also more likely to be tissue specific than eQTLs identified by linear regression.
An R package is available on R CRAN at https://cran.r-project.org/web/packages/QRank .
Supplementary data are available at Bioinformatics online.
在过去的十年中,我们对遗传变异在复杂人类疾病中的作用的理解有了显著的提高,特别是通过全基因组关联研究。然而,潜在的分子机制仍未得到很好的描述,阻碍了治疗干预的发展。确定影响基因表达水平的遗传变异,即表达数量性状基因座(eQTLs),可以帮助我们了解遗传变异如何在分子水平上影响性状。虽然大多数 eQTL 研究集中于使用线性回归来识别对基因表达的均值影响,但有证据表明遗传变异可以影响表达水平的整个分布。受潜在高阶关联的启发,一些研究调查了方差 eQTLs。
在本文中,我们开发了一种基于分位数秩得分的检验(QRank),该检验提供了一种简便的方法来识别与基因表达的条件分位数函数相关的 eQTLs。我们将提出的 QRank 应用于基因型组织表达项目(Genotype-Tissue Expression project),这是一个国际组织,用于研究人类组织中遗传变异与基因表达之间的关系,发现提出的 QRank 补充了现有的方法,并确定了具有不同分位数水平的异质效应的新 eQTLs。值得注意的是,我们表明,由 QRank 识别而线性回归错过的 eQTLs与来自 GWAS 目录的全基因组显著 SNPs 具有更高的富集相关性,并且比由线性回归识别的 eQTLs更可能具有组织特异性。
一个 R 包可在 R CRAN 上获得,网址为 https://cran.r-project.org/web/packages/QRank。
补充数据可在生物信息学在线获得。