Ma Xiaoyan, Ezer Daphne, Navarro Carmen, Adryan Boris
Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK.
Cambridge Systems Biology Center, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, UK.
BMC Bioinformatics. 2015 Aug 20;16:265. doi: 10.1186/s12859-015-0666-1.
Scoring DNA sequences against Position Weight Matrices (PWMs) is a widely adopted method to identify putative transcription factor binding sites. While common bioinformatics tools produce scores that can reflect the binding strength between a specific transcription factor and the DNA, these scores are not directly comparable between different transcription factors. Other methods, including p-value associated approaches (Touzet H, Varré J-S. Efficient and accurate p-value computation for position weight matrices. Algorithms Mol Biol. 2007;2(1510.1186):1748-7188), provide more rigorous ways to identify potential binding sites, but their results are difficult to interpret in terms of binding energy, which is essential for the modeling of transcription factor binding dynamics and enhancer activities.
Here, we provide two different ways to find the scaling parameter λ that allows us to infer binding energy from a PWM score. The first approach uses a PWM and background genomic sequence as input to estimate λ for a specific transcription factor, which we applied to show that λ distributions for different transcription factor families correspond with their DNA binding properties. Our second method can reliably convert λ between different PWMs of the same transcription factor, which allows us to directly compare PWMs that were generated by different approaches.
These two approaches provide computationally efficient ways to scale PWM scores and estimate the strength of transcription factor binding sites in quantitative studies of binding dynamics. Their results are consistent with each other and previous reports in most of cases.
将DNA序列与位置权重矩阵(PWM)进行比对是一种广泛采用的识别潜在转录因子结合位点的方法。虽然常见的生物信息学工具产生的分数可以反映特定转录因子与DNA之间的结合强度,但这些分数在不同转录因子之间不能直接比较。其他方法,包括与p值相关的方法(图泽特H,瓦雷J-S。位置权重矩阵的高效准确p值计算。算法分子生物学。2007;2(1510.1186):1748 - 7188),提供了更严格的方法来识别潜在的结合位点,但其结果在结合能方面难以解释,而结合能对于转录因子结合动力学和增强子活性的建模至关重要。
在这里,我们提供了两种不同的方法来找到缩放参数λ,该参数使我们能够从PWM分数推断结合能。第一种方法使用PWM和背景基因组序列作为输入来估计特定转录因子的λ,我们应用该方法表明不同转录因子家族的λ分布与其DNA结合特性相对应。我们的第二种方法可以可靠地在同一转录因子的不同PWM之间转换λ,这使我们能够直接比较由不同方法生成的PWM。
这两种方法提供了计算效率高的方法来缩放PWM分数,并在结合动力学的定量研究中估计转录因子结合位点的强度。在大多数情况下,它们的结果相互一致且与先前的报告一致。