基于随机森林的灵活综合方法提高了转录因子结合位点的预测能力。

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.

机构信息

Department of Biomedical Molecular Biology, Ghent University, B-9052 Ghent, Belgium.

出版信息

Nucleic Acids Res. 2012 Aug;40(14):e106. doi: 10.1093/nar/gks283. Epub 2012 Apr 5.

DOI:10.1093/nar/gks283

PMID:22492513

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3413102/

Abstract

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.

摘要

转录因子结合位点 (TFBSs) 是 6-15 个碱基对的 DNA 序列。这些 TFBSs 与转录因子 (TFs) 的相互作用在很大程度上负责大多数时空基因表达模式。在这里，我们评估通过考虑核苷酸的位置依赖性 (NPDs) 和 DNA 的核苷酸序列依赖性结构，基于序列的 TFBSs 预测可以在多大程度上得到改进。我们利用随机森林算法灵活地利用这两种类型的信息。本研究的结果表明，结构方法和 NPD 方法都可以对 TFBSs 的预测有价值。此外，它们的预测值似乎是互补的，甚至与广泛使用的位置权重矩阵 (PWM) 方法也是互补的。这促使我们将这三种方法结合起来。对于具有不同 DNA 结合域的五个真核 TF 的结果表明，与其他方法相比，我们的方法提高了所有五个真核 TF 的分类准确性。此外，我们对比了具有高质量数据的七个较小的原核集合的结果，并表明通过使用高质量数据，我们可以显著提高预测性能。本研究中开发的模型对于深入了解 TF 结合的机制非常有用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于随机森林的灵活综合方法提高了转录因子结合位点的预测能力。

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于随机森林的灵活综合方法提高了转录因子结合位点的预测能力。

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献