Computational and Systems Biology, Genome Institute of Singapore, Singapore.
Nat Biotechnol. 2013 Jul;31(7):615-22. doi: 10.1038/nbt.2596. Epub 2013 Jun 16.
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
尽管高通量测序数据的分析存在明显的多样性,但许多问题只是两个一般问题的特例,即信号检测和信号估计。在这里,我们从信号处理理论中采用形式最优的解决方案来分析映射到基因组的 DNA 序列读取的信号。我们描述了 DFilter,这是一种检测算法,它比特定于检测的算法更准确地识别 ChIP-seq、DNase-seq 和 FAIRE-seq 数据中的调控特征。我们还描述了 EFilter,这是一种估计算法,它可以仅从 1-2 个组蛋白图谱(R∼0.9)准确预测 mRNA 水平。值得注意的是,启动子中调控基序的存在与组蛋白修饰的相关性大于与 mRNA 水平的相关性,这表明组蛋白图谱更能预测顺式调控机制。我们通过将 DFilter 和 EFilter 应用于胚胎前脑 ChIP-seq 数据,表明尽管存在组织异质性,但仍然可以进行调控蛋白鉴定和功能注释。我们的工具所基于的数学形式化方法便于对几乎任何基于测序的功能图谱的数据进行综合分析。