Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, La Jolla, California, United States of America.
PLoS One. 2011;6(9):e24051. doi: 10.1371/journal.pone.0024051. Epub 2011 Sep 30.
Identification of diffuse signals from the chromatin immunoprecipitation and high-throughput massively parallel sequencing (ChIP-Seq) technology poses significant computational challenges, and there are few methods currently available. We present a novel global clustering approach to enrich diffuse CHIP-Seq signals of RNA polymerase II and histone 3 lysine 4 trimethylation (H3K4Me3) and apply it to identify putative long intergenic non-coding RNAs (lincRNAs) in macrophage cells. Our global clustering method compares favorably to the local clustering method SICER that was also designed to identify diffuse CHIP-Seq signals. The validity of the algorithm is confirmed at several levels. First, 8 out of a total of 11 selected putative lincRNA regions in primary macrophages respond to lipopolysaccharides (LPS) treatment as predicted by our computational method. Second, the genes nearest to lincRNAs are enriched with biological functions related to metabolic processes under resting conditions but with developmental and immune-related functions under LPS treatment. Third, the putative lincRNAs have conserved promoters, modestly conserved exons, and expected secondary structures by prediction. Last, they are enriched with motifs of transcription factors such as PU.1 and AP.1, previously shown to be important lineage determining factors in macrophages, and 83% of them overlap with distal enhancers markers. In summary, GCLS based on RNA polymerase II and H3K4Me3 CHIP-Seq method can effectively detect putative lincRNAs that exhibit expected characteristics, as exemplified by macrophages in the study.
从染色质免疫沉淀和高通量大规模平行测序(ChIP-Seq)技术中识别弥散信号带来了重大的计算挑战,目前可用的方法很少。我们提出了一种新的全局聚类方法,用于富集 RNA 聚合酶 II 和组蛋白 3 赖氨酸 4 三甲基化(H3K4Me3)的弥散 CHIP-Seq 信号,并将其应用于鉴定巨噬细胞中假定的长非编码 RNA(lincRNA)。我们的全局聚类方法与专门用于识别弥散 CHIP-Seq 信号的局部聚类方法 SICER 相比具有优势。该算法的有效性在几个层面上得到了验证。首先,在所选择的 11 个初级巨噬细胞中的假定 lincRNA 区域中,有 8 个如我们的计算方法所预测的那样对脂多糖(LPS)处理有反应。其次,在静止状态下,与 lincRNA 最接近的基因富含与代谢过程相关的生物学功能,但在 LPS 处理下则富含发育和免疫相关的功能。第三,假定的 lincRNA 具有保守的启动子、适度保守的外显子和预期的二级结构。最后,它们富含转录因子的基序,如 PU.1 和 AP.1,这些因子先前被证明是巨噬细胞中重要的谱系决定因子,其中 83%与远端增强子标记重叠。总之,基于 RNA 聚合酶 II 和 H3K4Me3 CHIP-Seq 方法的 GCLS 可以有效地检测出具有预期特征的假定 lincRNA,如研究中的巨噬细胞。