Bi Chengpeng
Children's Mercy Hospitals and Clinics, 2401 Gillham Road, Pediatrics Research Building, Third Floor, Kansas City, Missouri 64108, USA.
J Bioinform Comput Biol. 2007 Feb;5(1):47-77. doi: 10.1142/s0219720007002527.
Position weight matrix-based statistical modeling for the identification and characterization of motif sites in a set of unaligned biopolymer sequences is presented. This paper describes and implements a new algorithm, the Stochastic EM-type Algorithm for Motif-finding (SEAM), and redesigns and implements the EM-based motif-finding algorithm called deterministic EM (DEM) for comparison with SEAM, its stochastic counterpart. The gold standard example, cyclic adenosine monophosphate receptor protein (CRP) binding sequences, together with other biological sequences, is used to illustrate the performance of the new algorithm and compare it with other popular motif-finding programs. The convergence of the new algorithm is shown by simulation. The in silico experiments using simulated and biological examples illustrate the power and robustness of the new algorithm SEAM in de novo motif discovery.
提出了基于位置权重矩阵的统计建模方法,用于识别和表征一组未比对的生物聚合物序列中的基序位点。本文描述并实现了一种新算法——用于基序发现的随机期望最大化(EM)型算法(SEAM),并重新设计和实现了基于EM的确定性EM(DEM)基序发现算法,以便与它的随机对应算法SEAM进行比较。使用环磷酸腺苷受体蛋白(CRP)结合序列这一黄金标准示例以及其他生物序列来说明新算法的性能,并将其与其他流行的基序发现程序进行比较。通过模拟展示了新算法的收敛性。使用模拟和生物学示例进行的计算机模拟实验说明了新算法SEAM在从头基序发现中的能力和稳健性。