Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, SE1 9RT, United Kingdom.
UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, United Kingdom.
Genome Res. 2022 Aug 25;32(8):1565-1572. doi: 10.1101/gr.276296.121.
Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the ersonalised SE aller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type-specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.
分析等位基因特异性基因表达(ASE)是研究基因调控的一种强大方法,特别是在样本量较小的情况下,例如罕见疾病,或研究罕见遗传变异的影响时。然而,ASE 事件的检测依赖于 RNA 测序reads 的精确比对,这方面仍然存在挑战,特别是对于包含遗传变异或可与许多不同基因组位置对齐的 reads。我们开发了个人化 SE 等位基因(PAC),这是一种工具,可通过多个步骤来提高等位基因 reads 的定量,包括使用改进的多映射 read 分配方法进行个性化(即二倍体)read 比对。使用模拟的 RNA 测序数据,我们表明 PAC 优于 ASE 检测的标准比对方法,减少了错误偏倚大于 10%的位点数量约 80%,并增加了可可靠定量的位点数量约 3%。将 PAC 应用于 670 个全血样本的真实 RNA 测序数据,我们表明从 ASE 数据推断的遗传调控特征与基于人群的方法更接近,这些方法不太容易受到比对偏差的影响。最后,我们使用 PAC 来描述标准比对方法会错过的细胞类型特异性 ASE 事件,并以此鉴定可能通过调节基因表达来改变其作用的疾病相关基因。PAC 可以应用于大量现有的 RNA 测序数据集,以更好地了解广泛的基本生物学和疾病过程。