Corcoran David L, Feingold Eleanor, Dominick Jessica, Wright Marietta, Harnaha Jo, Trucco Massimo, Giannoukakis Nick, Benos Panayiotis V
Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15621, USA.
Genome Res. 2005 Jun;15(6):840-7. doi: 10.1101/gr.2952005.
The search for mammalian DNA regulatory regions poses a challenging problem in computational biology. The short length of the DNA patterns compared with the size of the promoter regions and the degeneracy of the patterns makes their identification difficult. One way to overcome this problem is to use evolutionary information to reduce the number of false-positive predictions. We developed a novel method for pattern identification that compares a pair of putative binding sites in two species (e.g., human and mouse) and assigns two probability scores based on the relative position of the sites in the promoter and their agreement with a known model of binding preferences. We tested the algorithm's ability to predict known binding sites on various promoters. Overall, it exhibited 83% sensitivity and the specificity was 72%, which is a clear improvement over existing methods. Our algorithm also successfully predicted two novel NF-kappaB binding sites in the promoter region of the mouse autotaxin gene (ATX, ENPP2), which we were able to verify by using chromatin immunoprecipitation assay coupled with quantitative real-time PCR.
在计算生物学中,寻找哺乳动物DNA调控区域是一个具有挑战性的问题。与启动子区域的大小相比,DNA模式的长度较短,且模式具有简并性,这使得它们的识别变得困难。克服这一问题的一种方法是利用进化信息来减少假阳性预测的数量。我们开发了一种用于模式识别的新方法,该方法比较两个物种(如人类和小鼠)中的一对假定结合位点,并根据这些位点在启动子中的相对位置及其与已知结合偏好模型的一致性给出两个概率分数。我们测试了该算法预测各种启动子上已知结合位点的能力。总体而言,它表现出83%的灵敏度,特异性为72%,这明显优于现有方法。我们的算法还成功预测了小鼠自分泌运动因子基因(ATX,ENPP2)启动子区域中的两个新的NF-κB结合位点,我们能够通过染色质免疫沉淀测定结合定量实时PCR来验证这两个位点。