Yang Jichen, Ramsey Stephen A
Department of Biomedical Sciences and.
Department of Biomedical Sciences and School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.
The position-weight matrix (PWM) is a useful representation of a transcription factor binding site (TFBS) sequence pattern because the PWM can be estimated from a small number of representative TFBS sequences. However, because the PWM probability model assumes independence between individual nucleotide positions, the PWMs for some TFs poorly discriminate binding sites from non-binding-sites that have similar sequence content. Since the local three-dimensional DNA structure ('shape') is a determinant of TF binding specificity and since DNA shape has a significant sequence-dependence, we combined DNA shape-derived features into a TF-generalized regulatory score and tested whether the score could improve PWM-based discrimination of TFBS from non-binding-sites.
We compared a traditional PWM model to a model that combines the PWM with a DNA shape feature-based regulatory potential score, for accuracy in detecting binding sites for 75 vertebrate transcription factors. The PWM+shape model was more accurate than the PWM-only model, for 45% of TFs tested, with no significant loss of accuracy for the remaining TFs.
The shape-based model is available as an open-source R package at that is archived on the GitHub software repository at https://github.com/ramseylab/regshape/.
stephen.ramsey@oregonstate.edu
Supplementary data are available at Bioinformatics online.
位置权重矩阵(PWM)是转录因子结合位点(TFBS)序列模式的一种有用表示形式,因为PWM可以从少量代表性TFBS序列中估计出来。然而,由于PWM概率模型假设各个核苷酸位置之间相互独立,某些转录因子的PWM难以区分具有相似序列内容的结合位点和非结合位点。由于局部三维DNA结构(“形状”)是TF结合特异性的决定因素,并且由于DNA形状具有显著的序列依赖性,我们将基于DNA形状的特征组合成一个TF通用调控得分,并测试该得分是否可以提高基于PWM的TFBS与非结合位点的区分能力。
我们将传统的PWM模型与一个将PWM与基于DNA形状特征的调控潜力得分相结合的模型进行了比较,以检测75种脊椎动物转录因子结合位点的准确性。对于45%的测试转录因子,PWM+形状模型比仅使用PWM的模型更准确,其余转录因子的准确性没有显著损失。
基于形状的模型作为一个开源R包提供,存档于GitHub软件仓库(https://github.com/ramseylab/regshape/)。
stephen.ramsey@oregonstate.edu
补充数据可在《生物信息学》在线获取。