Ruan Shuxiang, Stormo Gary D
Department of Genetics and The Edison Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America.
PLoS Comput Biol. 2017 Jul 7;13(7):e1005638. doi: 10.1371/journal.pcbi.1005638. eCollection 2017 Jul.
The specificities of transcription factors are most commonly represented with probabilistic models. These models provide a probability for each base occurring at each position within the binding site and the positions are assumed to contribute independently. The model is simple and intuitive and is the basis for many motif discovery algorithms. However, the model also has inherent limitations that prevent it from accurately representing true binding probabilities, especially for the highest affinity sites under conditions of high protein concentration. The limitations are not due to the assumption of independence between positions but rather are caused by the non-linear relationship between binding affinity and binding probability and the fact that independent normalization at each position skews the site probabilities. Generally probabilistic models are reasonably good approximations, but new high-throughput methods allow for biophysical models with increased accuracy that should be used whenever possible.
转录因子的特异性最常由概率模型表示。这些模型为结合位点内每个位置出现的每个碱基提供一个概率,并且假定这些位置是独立起作用的。该模型简单直观,是许多基序发现算法的基础。然而,该模型也有其固有的局限性,使其无法准确表示真实的结合概率,特别是在高蛋白浓度条件下对于最高亲和力位点的情况。这些局限性并非源于位置之间独立性的假设,而是由结合亲和力与结合概率之间的非线性关系以及每个位置的独立归一化使位点概率产生偏差这一事实导致的。一般来说,概率模型是相当不错的近似,但新的高通量方法允许使用准确性更高的生物物理模型,应尽可能使用这些模型。