Zellers Rowan G, Drewell Robert A, Dresch Jacqueline M
Department of Computer Science, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA.
Department of Mathematics, Harvey Mudd College, 301 Platt Boulevard, Claremont CA, 91711, USA.
BMC Bioinformatics. 2015 Jan 31;16:30. doi: 10.1186/s12859-014-0446-3.
A key challenge in understanding the molecular mechanisms that control gene regulation is the characterization of the specificity with which transcription factor proteins bind to specific DNA sequences. A number of computational approaches have been developed to examine these interactions, including simple mononucleotide and dinucleotide position weight matrix models.
Here we develop a novel, unbiased computational algorithm, MARZ, that systematically analyzes all possible gapped matrices across a fixed number of nucleotides. In addition, to evaluate the ability of these matrix models to predict in vivo binding sites, we utilize a new scoring system and, in combination with established scoring methods and statistical analysis, test the performance of 32 different gapped matrices on the well characterized HUNCHBACK transcription factor in Drosophila.
Our results indicate that in many cases gapped matrix models can outperform traditional models, but that the relative strength of the binding sites considered in the analysis can profoundly influence the predictive ability of specific models.
理解控制基因调控的分子机制面临的一个关键挑战是表征转录因子蛋白与特定DNA序列结合的特异性。已经开发了许多计算方法来研究这些相互作用,包括简单的单核苷酸和二核苷酸位置权重矩阵模型。
在此,我们开发了一种新颖的、无偏差的计算算法MARZ,它系统地分析固定数量核苷酸上的所有可能的带间隙矩阵。此外,为了评估这些矩阵模型预测体内结合位点的能力,我们使用了一种新的评分系统,并结合既定的评分方法和统计分析,测试了32种不同带间隙矩阵对果蝇中特征明确的驼背转录因子的性能。
我们的结果表明,在许多情况下,带间隙矩阵模型可以优于传统模型,但分析中考虑的结合位点的相对强度会深刻影响特定模型的预测能力。