Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
Mol Syst Biol. 2011 Aug 2;7:522. doi: 10.1038/msb.2011.54.
To study allele-specific expression (ASE) and binding (ASB), that is, differences between the maternally and paternally derived alleles, we have developed a computational pipeline (AlleleSeq). Our pipeline initially constructs a diploid personal genome sequence (and corresponding personalized gene annotation) using genomic sequence variants (SNPs, indels, and structural variants), and then identifies allele-specific events with significant differences in the number of mapped reads between maternal and paternal alleles. There are many technical challenges in the construction and alignment of reads to a personal diploid genome sequence that we address, for example, bias of reads mapping to the reference allele. We have applied AlleleSeq to variation data for NA12878 from the 1000 Genomes Project as well as matched, deeply sequenced RNA-Seq and ChIP-Seq data sets generated for this purpose. In addition to observing fairly widespread allele-specific behavior within individual functional genomic data sets (including results consistent with X-chromosome inactivation), we can study the interaction between ASE and ASB. Furthermore, we investigate the coordination between ASE and ASB from multiple transcription factors events using a regulatory network framework. Correlation analyses and network motifs show mostly coordinated ASB and ASE.
为了研究等位基因特异性表达(ASE)和结合(ASB),即母源和父源等位基因之间的差异,我们开发了一种计算管道(AlleleSeq)。我们的管道首先使用基因组序列变体(SNP、插入缺失和结构变体)构建一个二倍体个人基因组序列(和相应的个性化基因注释),然后识别在母源和父源等位基因之间映射读取数量存在显著差异的等位基因特异性事件。在构建和将读取映射到个人二倍体基因组序列时存在许多技术挑战,例如读取映射到参考等位基因的偏差。我们已经将 AlleleSeq 应用于 1000 基因组计划中 NA12878 的变异数据以及为此目的生成的匹配的、深度测序的 RNA-Seq 和 ChIP-Seq 数据集。除了在单个功能基因组数据集内观察到相当普遍的等位基因特异性行为(包括与 X 染色体失活一致的结果)之外,我们还可以研究 ASE 和 ASB 之间的相互作用。此外,我们使用调控网络框架研究来自多个转录因子事件的 ASE 和 ASB 的协调。相关分析和网络基元显示出大多数协调的 ASB 和 ASE。