Wood David L A, Nones Katia, Steptoe Anita, Christ Angelika, Harliwong Ivon, Newell Felicity, Bruxner Timothy J C, Miller David, Cloonan Nicole, Grimmond Sean M
Queensland Centre for Medical Genomics, University of Queensland, Brisbane, Australia.
QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, QLD, 4006, Australia.
PLoS One. 2015 May 12;10(5):e0126911. doi: 10.1371/journal.pone.0126911. eCollection 2015.
Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual's phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci.
遗传变异在转录或转录后水平调节基因表达,并可深刻改变个体的表型。测量个体杂合位点的等位基因差异表达(一种称为等位基因特异性表达,即ASE的现象)有助于识别此类因素。大规模平行DNA和RNA测序以及生物信息学方法的进展为全基因组测量ASE提供了绝佳机会。在本研究中,整合了匹配的DNA和RNA测序、基因分型阵列以及计算分相单倍型,以全面且保守地量化单个人脑和肝脏组织样本中的ASE。我们描述了对ASE量化常见生物信息学步骤的方法学评估,并推荐一种稳健的方法,通过使用个性化单倍型基因组比对、严格的比对质量控制和基因内SNP聚集来准确测量SNP、基因和异构体ASE。我们的结果表明,准确的ASE量化需要仔细的生物信息学分析,并且即使在中等序列深度下,也会受到样本特异性比对混杂因素和随机抽样的不利影响。我们在肝脏中鉴定出多个已知的以及几个新的ASE基因,包括WDR72、DSP和UBD,以及包含ASE SNP且失衡方向与单倍型相位不一致的基因,这可通过注释的转录本结构来解释,表明存在异构体衍生的ASE。本研究中评估的方法将对进行ASE高度保守量化的研究人员有用,而鉴定出的基因和异构体对于研究这些位点的研究人员而言是感兴趣的ASE。