King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia.
Department of Sciences and Technologies, University 'Parthenope' of Naples, Centro Direzionale Isola C4 80143, Naples, Italy.
Nucleic Acids Res. 2018 Jul 6;46(12):e72. doi: 10.1093/nar/gky237.
Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
鉴定转录因子(TF)结合位点(TFBS)在基因调控的计算推断中非常重要。基于位置权重矩阵(PWM)的广泛使用的 TFBS 预测计算方法通常具有较高的假阳性率。此外,由于涉及大量 TF,真核生物转录调控的计算研究经常需要大量的 TFBS PWM 模型。为了克服这些问题,我们开发了 DRAF,这是一种新的 TFBS 预测方法,仅需 14 个预测模型即可用于 232 个人类 TF,同时显著提高了预测准确性。DRAF 模型比 PWM 模型使用更多的特征,因为它们将 TFBS 序列和 TF DNA 结合域的物理化学特性的信息结合到机器学习模型中。在 98 个人类 ChIP-seq 数据集上对 DRAF 的评估表明,与 HOCOMOCO、TRANSFAC 和 DeepBind 的模型相比,在相同敏感性下,假阳性率分别平均降低了 1.54 倍、1.96 倍和 5.19 倍。这一观察结果表明,可以通过少量的 DRAF 模型有效地替代 TFBS 预测的 PWM 模型,从而显著提高预测准确性。DRAF 方法在一个网络工具和一个独立的软件中实现,可在 http://cbrc.kaust.edu.sa/DRAF 上免费获得。