Ramachandran Parameswaran, Perkins Theodore J
BMC Proc. 2013 Dec 20;7(Suppl 7):S7. doi: 10.1186/1753-6561-7-S7-S7.
High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition. Numerous algorithms have been developed to extract different kinds of information from such data. However, there has been very little focus on the reconstruction of the genomic signal itself. Such reconstructions may be useful for a variety of purposes ranging from simple visualization of the signals to sophisticated comparison of different datasets.
Here, we propose that adaptive-bandwidth kernel density estimators are well-suited for genomic signal reconstructions. This class of estimators is a natural extension of the fixed-bandwidth estimators that have been employed in several existing ChIP-Seq analysis programs.
Using a set of ChIP-Seq datasets from the ENCODE project, we show that adaptive-bandwidth estimators have greater accuracy at signal reconstruction compared to fixed-bandwidth estimators, and that they have significant advantages in terms of visualization as well. For both fixed and adaptive-bandwidth schemes, we demonstrate that smoothing parameters can be set automatically using a held-out set of tuning data. We also carry out a computational complexity analysis of the different schemes and confirm through experimentation that the necessary computations can be readily carried out on a modern workstation without any significant issues.
高通量测序实验可被视为对某种“基因组信号”的测量,这种信号可能代表一种生物学事件,如转录因子与基因组的结合、染色质修饰的位置,甚至是一种背景或对照条件。已经开发了许多算法来从此类数据中提取不同类型的信息。然而,很少有人关注基因组信号本身的重建。这种重建对于从简单的信号可视化到不同数据集的复杂比较等各种目的可能都很有用。
在此,我们提出自适应带宽核密度估计器非常适合基因组信号重建。这类估计器是在几个现有的ChIP-Seq分析程序中使用的固定带宽估计器的自然扩展。
使用来自ENCODE项目的一组ChIP-Seq数据集,我们表明与固定带宽估计器相比,自适应带宽估计器在信号重建方面具有更高的准确性,并且在可视化方面也具有显著优势。对于固定和自适应带宽方案,我们证明可以使用一组留出的调优数据自动设置平滑参数。我们还对不同方案进行了计算复杂度分析,并通过实验证实必要的计算可以在现代工作站上轻松进行,没有任何重大问题。