Deonovic Benjamin, Wang Yunhao, Weirather Jason, Wang Xiu-Jie, Au Kin Fai
Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA.
Department of Internal Medicine, University of Iowa, Iowa City, IA 52242, USA.
Nucleic Acids Res. 2017 Mar 17;45(5):e32. doi: 10.1093/nar/gkw1076.
Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.
等位基因特异性表达(ASE)是研究基因调控和二倍体转录组图谱中的一个基本问题,存在两个关键挑战:(i)单倍型分型,以及(ii)在基因异构体水平上估计ASE。现有的ASE分析方法受到依赖于费力的实验或额外的基因组/家系三联体数据进行单倍型分型的限制。此外,缺乏用于基因异构体水平ASE分析的方法。我们开发了一种工具IDP-ASE,用于全面的ASE分析。通过将第三代测序(TGS)长读段与第二代测序(SGS)短读段进行创新性整合,正如金标准数据GM12878数据和半模拟数据所证明的那样,在基因和基因异构体水平上的单倍型分型和ASE定量的准确性得到了极大提高。除了方法学开发,IDP-ASE在人类胚胎干细胞和乳腺癌细胞中的应用表明,ASE的不平衡和基因异构体ASE的不均匀性很普遍,包括与肿瘤发生相关的基因和多能性标记。这些结果表明,基因异构体表达和等位基因特异性表达共同作用,为基因调控和表达提供了高度的多样性和复杂性,突出了在基因异构体水平上研究ASE的重要性。我们的研究提供了一种强大的生物信息学解决方案,仅使用RNA测序数据就能理解ASE。