Pirino Davide, Rigosa Jacopo, Ledda Alice, Ferretti Luca
LEM, Scuola Superiore Sant'Anna, 56127 Pisa, Italy.
Phys Rev E Stat Nonlin Soft Matter Phys. 2012 Jun;85(6 Pt 2):066124. doi: 10.1103/PhysRevE.85.066124. Epub 2012 Jun 19.
Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.
序列基序是DNA中具有生物学功能(如基因调控)的核苷酸序列。此类序列的识别是通过拒绝基于基因组上预期基序频率的马尔可夫模型来进行的。可以从基序出现模式之间的相关结构中提取额外的生物学信息。在本文中,通过期望最大化算法,对大肠杆菌K12基因组上的一组基序估计了一个对数线性多元强度泊松模型。所提出的方法允许基序之间以及基序与其他基因组特征(如基因出现情况)之间存在兴奋性和抑制性相互作用。我们的研究结果证实了此前关于此类相互作用的一些典型事实,并为某些特定基序的基因组维护功能提供了新的线索。我们期望这些方法能应用于更广泛的基因组特征。