Department of Computational Medicine and Bioinformatics.
Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
Genome Res. 2020 Jul;30(7):1040-1046. doi: 10.1101/gr.258228.119. Epub 2020 Jul 6.
Transcription is tightly regulated by -regulatory DNA elements where transcription factors (TFs) can bind. Thus, identification of TF binding sites (TFBSs) is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches used for TFBS prediction, such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq), are widely used but have their drawbacks, including high false-positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns; however, these also have limitations. We have developed a footprinting method to predict TF footprints in active chromatin elements (TRACE) to improve the prediction of TFBS footprints. TRACE incorporates DNase-seq data and PWMs within a multivariate hidden Markov model (HMM) to detect footprint-like regions with matching motifs. TRACE is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement for pregenerated candidate binding sites or ChIP-seq training data. Compared with published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.
转录受到 - 调控 DNA 元件的严格调控,转录因子 (TFs) 可以结合在这些元件上。因此,鉴定 TF 结合位点 (TFBSs) 是理解细胞内基因表达和整个调控网络的关键。用于 TFBS 预测的标准方法,如位置权重矩阵 (PWMs) 和染色质免疫沉淀 followed by sequencing (ChIP-seq),虽然被广泛应用,但也存在各自的缺陷,分别是高假阳性率和有限的抗体可用性。已经开发了几种计算足迹算法来通过研究染色质可及性模式来检测 TFBSs;然而,这些也有局限性。我们开发了一种足迹预测方法,用于预测活性染色质元件中的 TF 足迹 (TRACE),以提高 TFBS 足迹的预测准确性。TRACE 将 DNase-seq 数据和 PWM 纳入多元隐马尔可夫模型 (HMM) 中,以检测具有匹配基序的类似足迹的区域。TRACE 是一种无监督的方法,能够自动准确地注释特定 TF 的结合位点,而不需要预先生成的候选结合位点或 ChIP-seq 训练数据。与已发表的足迹算法相比,TRACE 具有最佳的整体性能,其独特的优势在于在单个模型中针对多个基序。