Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Genome Res. 2011 Oct;21(10):1728-37. doi: 10.1101/gr.119784.110. Epub 2011 Aug 26.
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.
基因表达的变异被认为是造成群体内个体表型多样性的重要原因。尽管高通量 cDNA 测序为描绘调控变异的全基因组结构提供了独特的机会,但需要开发新的统计方法来充分利用 RNA-seq 数据集所包含的丰富信息。为此,我们开发了一种强大而灵活的层次贝叶斯模型,该模型结合了跨基因座的信息,允许对等位基因特异性表达(ASE)进行全局和特定基因座的推断。我们将我们的方法应用于从两个不同的酿酒酵母菌株的二倍体杂种中获得的大型 RNA-seq 数据集,以及来自个体人类基因组的 RNA-seq 数据。我们的统计框架以指定的错误发现率准确地量化 ASE 水平,在独立的测序平台之间实现了高度的可重复性。我们确定了显示 ASE 异常和生物学有趣模式的基因座,包括等位基因特异性的选择性剪接和转录终止位点。我们的方法为整个基因组的 ASE 分析提供了一种严格、定量和高分辨率的工具。