Suppr超能文献

用于ChIP-Seq数据分析的统计框架

A Statistical Framework for the Analysis of ChIP-Seq Data.

作者信息

Kuan Pei Fen, Chung Dongjun, Pan Guangjin, Thomson James A, Stewart Ron, Keleş Sündüz

机构信息

Departments of Statistics and of Biostatistics and Medical Informatics.

Genome Center of Wisconsin and Morgridge Institute for Research.

出版信息

J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.

Abstract

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies.

摘要

染色质免疫沉淀测序(ChIP-Seq)彻底改变了用于全基因组分析DNA结合蛋白、组蛋白修饰和核小体占据情况的实验。随着测序成本的降低,许多研究人员正从基于微阵列的技术(ChIP-chip)转向ChIP-Seq,以进行全基因组转录调控研究。尽管ChIP-Seq越来越受欢迎且实至名归,但很少有工作去研究和解释该技术中偏差的来源。这些偏差通常源于标准的预处理方案和所生成数据的基础DNA序列。我们研究了来自裸DNA测序实验的数据,该实验在使DNA脱蛋白和剪切后对非交联DNA进行测序,以了解影响ChIP-Seq实验中数据背景分布的因素。我们引入了一个背景模型,该模型考虑了诸如可映射性和GC含量等明显的偏差来源,并开发了一种名为MOSAiCS的灵活混合模型,用于在ChIP-Seq数据的单样本和双样本分析中检测峰值。我们表明我们的模型能很好地拟合观察到的ChIP-Seq数据,并通过几个案例研究进一步证明了MOSAiCS相对于常用的ChIP-Seq数据分析工具的优势。

相似文献

1
A Statistical Framework for the Analysis of ChIP-Seq Data.用于ChIP-Seq数据分析的统计框架
J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.
9
A fully Bayesian hidden Ising model for ChIP-seq data analysis.用于 ChIP-seq 数据分析的全贝叶斯隐马尔可夫模型。
Biostatistics. 2012 Jan;13(1):113-28. doi: 10.1093/biostatistics/kxr029. Epub 2011 Sep 13.

引用本文的文献

本文引用的文献

6
Mapping accessible chromatin regions using Sono-Seq.使用超声测序法绘制可及染色质区域图谱。
Proc Natl Acad Sci U S A. 2009 Sep 1;106(35):14926-31. doi: 10.1073/pnas.0905443106. Epub 2009 Aug 18.
8
MEME SUITE: tools for motif discovery and searching.MEME套件:用于基序发现和搜索的工具。
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验