Andergassen Daniel, Dotter Christoph P, Kulinski Tomasz M, Guenzl Philipp M, Bammer Philipp C, Barlow Denise P, Pauler Florian M, Hudson Quanah J
CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3,1090 Vienna, Austria.
CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT 25.3,1090 Vienna, Austria
Nucleic Acids Res. 2015 Dec 2;43(21):e146. doi: 10.1093/nar/gkv727. Epub 2015 Jul 21.
Detecting allelic biases from high-throughput sequencing data requires an approach that maximises sensitivity while minimizing false positives. Here, we present Allelome.PRO, an automated user-friendly bioinformatics pipeline, which uses high-throughput sequencing data from reciprocal crosses of two genetically distinct mouse strains to detect allele-specific expression and chromatin modifications. Allelome.PRO extends approaches used in previous studies that exclusively analyzed imprinted expression to give a complete picture of the 'allelome' by automatically categorising the allelic expression of all genes in a given cell type into imprinted, strain-biased, biallelic or non-informative. Allelome.PRO offers increased sensitivity to analyze lowly expressed transcripts, together with a robust false discovery rate empirically calculated from variation in the sequencing data. We used RNA-seq data from mouse embryonic fibroblasts from F1 reciprocal crosses to determine a biologically relevant allelic ratio cutoff, and define for the first time an entire allelome. Furthermore, we show that Allelome.PRO detects differential enrichment of H3K4me3 over promoters from ChIP-seq data validating the RNA-seq results. This approach can be easily extended to analyze histone marks of active enhancers, or transcription factor binding sites and therefore provides a powerful tool to identify candidate cis regulatory elements genome wide.
从高通量测序数据中检测等位基因偏差需要一种在将假阳性降至最低的同时最大化灵敏度的方法。在此,我们展示了Allelome.PRO,这是一种自动化的、用户友好的生物信息学流程,它利用来自两个基因不同的小鼠品系的正反交的高通量测序数据来检测等位基因特异性表达和染色质修饰。Allelome.PRO扩展了先前研究中使用的方法,这些方法专门分析印记表达,通过自动将给定细胞类型中所有基因的等位基因表达分类为印记、品系偏差、双等位基因或无信息,从而全面了解“等位基因组”。Allelome.PRO提高了分析低表达转录本的灵敏度,并根据测序数据中的变异凭经验计算出稳健的错误发现率。我们使用来自F1正反交的小鼠胚胎成纤维细胞的RNA-seq数据来确定生物学相关的等位基因比率阈值,并首次定义了整个等位基因组。此外,我们表明Allelome.PRO从ChIP-seq数据中检测到启动子上H3K4me3的差异富集,从而验证了RNA-seq结果。这种方法可以很容易地扩展到分析活性增强子的组蛋白标记或转录因子结合位点,因此提供了一个强大的工具来在全基因组范围内识别候选顺式调控元件。