Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan.
PLoS Comput Biol. 2022 Aug 29;18(8):e1010436. doi: 10.1371/journal.pcbi.1010436. eCollection 2022 Aug.
Genomic variations are associated with gene expression levels, which are called expression quantitative trait loci (eQTL). Most eQTL may affect the total gene expression levels by regulating transcriptional activities of a specific promoter. However, the direct exploration of genomic loci associated with promoter activities using RNA-seq data has been challenging because eQTL analyses treat the total expression levels estimated by summing those of all isoforms transcribed from distinct promoters. Here we propose a new method for identifying genomic loci associated with promoter activities, called promoter usage quantitative trait loci (puQTL), using conventional RNA-seq data. By leveraging public RNA-seq datasets from the lymphoblastoid cell lines of 438 individuals from the GEUVADIS project, we obtained promoter activity estimates and mapped 2,592 puQTL at the 10% FDR level. The results of puQTL mapping enabled us to interpret the manner in which genomic variations regulate gene expression. We found that 310 puQTL genes (16.1%) were not detected by eQTL analysis, suggesting that our pipeline can identify novel variant-gene associations. Furthermore, we identified genomic loci associated with the activity of "hidden" promoters, which the standard eQTL studies have ignored. We found that most puQTL signals were concordant with at least one genome-wide association study (GWAS) signal, enabling novel interpretations of the molecular mechanisms of complex traits. Our results emphasize the importance of the re-analysis of public RNA-seq datasets to obtain novel insights into gene regulation by genomic variations and their contributions to complex traits.
基因组变异与基因表达水平相关,这些变异被称为表达数量性状基因座(eQTL)。大多数 eQTL 可能通过调节特定启动子的转录活性来影响总基因表达水平。然而,使用 RNA-seq 数据直接探索与启动子活性相关的基因组位点一直具有挑战性,因为 eQTL 分析通过将来自不同启动子转录的所有异构体的总和来处理总表达水平的估计。在这里,我们提出了一种使用常规 RNA-seq 数据识别与启动子活性相关的基因组位点的新方法,称为启动子使用数量性状基因座(puQTL)。通过利用来自 GEUVADIS 项目的 438 个人的淋巴母细胞系的公共 RNA-seq 数据集,我们获得了启动子活性估计值,并在 FDR 水平为 10%的情况下映射了 2592 个 puQTL。puQTL 映射的结果使我们能够解释基因组变异调节基因表达的方式。我们发现 310 个 puQTL 基因(16.1%)未被 eQTL 分析检测到,这表明我们的方法可以识别新的变体-基因关联。此外,我们还鉴定了与“隐藏”启动子活性相关的基因组位点,而标准的 eQTL 研究忽略了这些位点。我们发现大多数 puQTL 信号与至少一个全基因组关联研究(GWAS)信号一致,这使得我们能够对复杂性状的分子机制进行新的解释。我们的结果强调了重新分析公共 RNA-seq 数据集以获得关于基因组变异对基因调控及其对复杂性状贡献的新见解的重要性。