Guo Haitao, Huo Hongwei, Yu Qiang
School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China.
PLoS One. 2016 Sep 16;11(9):e0162968. doi: 10.1371/journal.pone.0162968. eCollection 2016.
The discovery of cis-regulatory modules (CRMs) is a challenging problem in computational biology. Limited by the difficulty of using an HMM to model dependent features in transcriptional regulatory sequences (TRSs), the probabilistic modeling methods based on HMMs cannot accurately represent the distance between regulatory elements in TRSs and are cumbersome to model the prevailing dependencies between motifs within CRMs. We propose a probabilistic modeling algorithm called SMCis, which builds a more powerful CRM discovery model based on a hidden semi-Markov model. Our model characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. Experimental results on three benchmark datasets indicate that our method performs better than the compared algorithms.
顺式调控模块(CRM)的发现是计算生物学中的一个具有挑战性的问题。由于使用隐马尔可夫模型(HMM)对转录调控序列(TRS)中的相关特征进行建模存在困难,基于HMM的概率建模方法无法准确表示TRS中调控元件之间的距离,并且在对CRM内基序之间普遍存在的相关性进行建模时也很繁琐。我们提出了一种名为SMCis的概率建模算法,它基于隐藏半马尔可夫模型构建了一个更强大的CRM发现模型。我们的模型表征了CRM的调控结构,并基于片段而非核苷酸在更高的抽象层次上有效地对基序之间的相关性进行建模。在三个基准数据集上的实验结果表明,我们的方法比所比较的算法表现更好。