Suppr超能文献

PICS:用于染色质免疫沉淀测序(ChIP-seq)的概率推断

PICS: probabilistic inference for ChIP-seq.

作者信息

Zhang Xuekui, Robertson Gordon, Krzywinski Martin, Ning Kaida, Droit Arnaud, Jones Steven, Gottardo Raphael

机构信息

Department of Statistics, University of British Columbia, Vancouver, BC, Canada.

出版信息

Biometrics. 2011 Mar;67(1):151-63. doi: 10.1111/j.1541-0420.2010.01441.x.

Abstract

ChIP-seq combines chromatin immunoprecipitation with massively parallel short-read sequencing. While it can profile genome-wide in vivo transcription factor-DNA association with higher sensitivity, specificity, and spatial resolution than ChIP-chip, it poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and from variability and biases in its sequence data. We propose a method called PICS (Probabilistic Inference for ChIP-seq) for identifying regions bound by transcription factors from aligned reads. PICS identifies binding event locations by modeling local concentrations of directional reads, and uses DNA fragment length prior information to discriminate closely adjacent binding events via a Bayesian hierarchical t-mixture model. It uses precalculated, whole-genome read mappability profiles and a truncated t-distribution to adjust binding event models for reads that are missing due to local genome repetitiveness. It estimates uncertainties in model parameters that can be used to define confidence regions on binding event locations and to filter estimates. Finally, PICS calculates a per-event enrichment score relative to a control sample, and can use a control sample to estimate a false discovery rate. Using published GABP and FOXA1 data from human cell lines, we show that PICS' predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods MACS, QuEST, CisGenome, and USeq. We then use a simulation study to confirm that PICS compares favorably to these methods and is robust to model misspecification.

摘要

染色质免疫沉淀测序(ChIP-seq)将染色质免疫沉淀与大规模平行短读长测序相结合。虽然与ChIP芯片相比,它能够以更高的灵敏度、特异性和空间分辨率在全基因组范围内分析体内转录因子与DNA的结合情况,但它也给统计分析带来了新的挑战,这些挑战源于所表征生物系统的复杂性以及其序列数据中的变异性和偏差。我们提出了一种名为PICS(ChIP-seq的概率推断)的方法,用于从比对后的读段中识别转录因子结合的区域。PICS通过对定向读段的局部浓度进行建模来识别结合事件的位置,并利用DNA片段长度先验信息,通过贝叶斯分层t混合模型来区分紧密相邻的结合事件。它使用预先计算的全基因组读段可映射性图谱和截断的t分布,对因局部基因组重复性而缺失的读段的结合事件模型进行调整。它估计模型参数的不确定性,这些不确定性可用于定义结合事件位置的置信区域并过滤估计值。最后,PICS计算相对于对照样本的每个事件的富集分数,并可使用对照样本估计错误发现率。利用来自人类细胞系的已发表的GABP和FOXA1数据,我们表明,与其他方法MACS、QuEST、CisGenome和USeq相比,PICS预测的结合位点与通过计算预测的结合基序更一致。然后,我们通过模拟研究证实,PICS与这些方法相比具有优势,并且对模型错误设定具有鲁棒性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验