Myschyshyn Mike, Farren-Dai Marco, Chuang Tien-Jui, Vocadlo David
Department of Molecular Biology and Biochemistry, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
Chemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
BMC Bioinformatics. 2017 Nov 25;18(1):521. doi: 10.1186/s12859-017-1936-x.
Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited.
We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format.
TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically relevant data analysis, and the uncovering of variations in the time-dependent behavior of chromatin. TDCA automates the analysis of TC ChIP-seq experiments, permitting researchers to easily obtain raw and modeled data for specific loci or groups of loci with similar behavior while also enhancing consistency of data analysis of TC data within the genomics field.
染色质免疫沉淀测序(ChIP-seq)及相关方法被广泛用于确定全基因组范围内染色质相关蛋白、翻译后表观遗传标记以及DNA碱基上修饰的分布。一个新兴的研究领域是通过以时间分辨方式进行的系列ChIP-seq实验来研究此类蛋白质和标记分布的时间依赖性变化。尽管此类时间分辨研究越来越普遍,但以强大的自动化方式促进此类数据分析的软件却很有限。
我们设计了名为时间依赖性ChIP测序分析仪(TDCA)的软件,它是首个通过拟合S形曲线来自动分析时间依赖性ChIP-seq数据的程序。我们使用两个模拟数据集为用户提供了TDCA实验设计的指导,用于对时间进程(TC)ChIP-seq数据进行建模。此外,我们通过表明对三个先前发表的TC数据集的自动化分析准确地概括了这些研究中报道的关键发现,证明了这种拟合策略具有广泛的适用性。使用这些数据集中的每一个,我们强调了通过利用TDCA产生描述单个位点或位点集行为的直观参数,如何能够轻松获得生物学相关的发现。TDCA能够对用户输入的比对DNA测序数据进行可定制分析,并以可用于发表的图形形式输出,描述单个位点或具有用户定义的共同特征的位点集的行为。TDCA接受测序数据作为标准二进制比对图(BAM)文件以及浏览器可扩展数据(BED)文件格式的感兴趣位点。
TDCA准确地对来自TC ChIP-seq研究或概念上相关的TC测序实验的位点处的测序读数数量或覆盖度进行建模。TC实验被简化为直观的参数值,便于进行生物学相关的数据分析,并揭示染色质时间依赖性行为的变化。TDCA使TC ChIP-seq实验的分析自动化,允许研究人员轻松获得特定位点或具有相似行为的位点组的原始数据和建模数据,同时还提高了基因组学领域内TC数据的数据分析一致性。