Farrel Alvin, Murphy Jonathan, Guo Jun-Tao
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA.
Bioinformatics. 2016 Jun 15;32(12):i306-i313. doi: 10.1093/bioinformatics/btw264.
Transcription factors (TFs) regulate gene expression through binding to specific target DNA sites. Accurate annotation of transcription factor binding sites (TFBSs) at genome scale represents an essential step toward our understanding of gene regulation networks. In this article, we present a structure-based method for computational prediction of TFBSs using a novel, integrative energy (IE) function. The new energy function combines a multibody (MB) knowledge-based potential and two atomic energy terms (hydrogen bond and π interaction) that might not be accurately captured by the knowledge-based potential owing to the mean force nature and low count problem. We applied the new energy function to the TFBS prediction using a non-redundant dataset that consists of TFs from 12 different families. Our results show that the new IE function improves the prediction accuracy over the knowledge-based, statistical potentials, especially for homeodomain TFs, the second largest TF family in mammals.
Supplementary data are available at Bioinformatics online.
转录因子(TFs)通过与特定的靶DNA位点结合来调控基因表达。在基因组规模上准确注释转录因子结合位点(TFBSs)是我们理解基因调控网络的关键一步。在本文中,我们提出了一种基于结构的方法,使用一种新颖的整合能量(IE)函数对TFBSs进行计算预测。新的能量函数结合了基于多体(MB)知识的势能和两个原子能量项(氢键和π相互作用),由于基于知识的势能的平均力性质和低计数问题,这两个原子能量项可能无法被基于知识的势能准确捕捉。我们将新的能量函数应用于使用由12个不同家族的TFs组成的非冗余数据集进行的TFBS预测。我们的结果表明,新的IE函数比基于知识的统计势能提高了预测准确性,特别是对于同源结构域TFs,这是哺乳动物中第二大TF家族。
补充数据可在《生物信息学》在线获取。