ChIPulate:一个全面的 ChIP-seq 模拟管道。

ChIPulate: A comprehensive ChIP-seq simulation pipeline.

机构信息

Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, TIFR, Bengaluru, Karnataka, India.

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America.

出版信息

PLoS Comput Biol. 2019 Mar 21;15(3):e1006921. doi: 10.1371/journal.pcbi.1006921. eCollection 2019 Mar.

Abstract

ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) is a high-throughput technique to identify genomic regions that are bound in vivo by a particular protein, e.g., a transcription factor (TF). Biological factors, such as chromatin state, indirect and cooperative binding, as well as experimental factors, such as antibody quality, cross-linking, and PCR biases, are known to affect the outcome of ChIP-seq experiments. However, the relative impact of these factors on inferences made from ChIP-seq data is not entirely clear. Here, via a detailed ChIP-seq simulation pipeline, ChIPulate, we assess the impact of various biological and experimental sources of variation on several outcomes of a ChIP-seq experiment, viz., the recoverability of the TF binding motif, accuracy of TF-DNA binding detection, the sensitivity of inferred TF-DNA binding strength, and number of replicates needed to confidently infer binding strength. We find that the TF motif can be recovered despite poor and non-uniform extraction and PCR amplification efficiencies. The recovery of the motif is, however, affected to a larger extent by the fraction of sites that are either cooperatively or indirectly bound. Importantly, our simulations reveal that the number of ChIP-seq replicates needed to accurately measure in vivo occupancy at high-affinity sites is larger than the recommended community standards. Our results establish statistical limits on the accuracy of inferences of protein-DNA binding from ChIP-seq and suggest that increasing the mean extraction efficiency, rather than amplification efficiency, would better improve sensitivity. The source code and instructions for running ChIPulate can be found at https://github.com/vishakad/chipulate.

摘要

ChIP-seq(染色质免疫沉淀 followed by sequencing)是一种高通量技术,可用于鉴定体内特定蛋白质(如转录因子 (TF))结合的基因组区域。生物因素,如染色质状态、间接和协同结合,以及实验因素,如抗体质量、交联和 PCR 偏差,已知会影响 ChIP-seq 实验的结果。然而,这些因素对从 ChIP-seq 数据中得出的推论的相对影响尚不完全清楚。在这里,我们通过一个详细的 ChIP-seq 模拟管道 ChIPulate,评估了各种生物和实验来源的变异对 ChIP-seq 实验的几个结果的影响,即 TF 结合基序的可恢复性、TF-DNA 结合检测的准确性、推断的 TF-DNA 结合强度的灵敏度以及需要多少个重复才能有信心推断结合强度。我们发现,尽管提取和 PCR 扩增效率不佳且不均匀,TF 基序仍可恢复。然而,基序的恢复在更大程度上受到协同或间接结合的位点分数的影响。重要的是,我们的模拟结果表明,为了准确测量高亲和力位点的体内占有率,需要进行更多的 ChIP-seq 重复,这比社区标准建议的重复数量要多。我们的结果确定了从 ChIP-seq 推断蛋白质-DNA 结合的准确性的统计限制,并表明增加平均提取效率而不是扩增效率将更好地提高灵敏度。ChIPulate 的源代码和运行说明可以在 https://github.com/vishakad/chipulate 上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5116/6445533/c67ab2faba6c/pcbi.1006921.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索