P Umesh, Dubey Jitendra Kumar, Rv Karthika, Cherian Betsy Sheena, Gopalakrishnan Gopakumar, Nair Achuthsankar Sukumaran
Department of Computational Biology and Bioinformatics, University of Kerala, Thiruvananthapuram - 695581, Kerala, India.
Department of Computer Science and Engineering, National Institute of Technology, Calicut - 673601, Kerala, India.
Bioinformation. 2014 Apr 23;10(4):175-9. doi: 10.6026/97320630010175. eCollection 2014.
Identification of promoters in DNA sequence using computational techniques is a significant research area because of its direct association in transcription regulation. A wide range of algorithms are available for promoter prediction. Most of them are polymerase dependent and cannot handle eukaryotes and prokaryotes alike. This study proposes a polymerase independent algorithm, which can predict whether a given DNA fragment is a promoter or not, based on the sequence features and statistical elements. This algorithm considers all possible pentamers formed from the nucleotides A, C, G, and T along with CpG islands, TATA box, initiator elements, and downstream promoter elements. The highlight of the algorithm is that it is not polymerase specific and can predict for both eukaryotes and prokaryotes in the same computational manner even though the underlying biological mechanisms of promoter recognition differ greatly. The proposed Method, Promoter Prediction System - PPS-CBM achieved a sensitivity, specificity, and accuracy percentages of 75.08, 83.58 and 79.33 on E. coli data set and 86.67, 88.41 and 87.58 on human data set. We have developed a tool based on PPS-CBM, the proposed algorithm, with which multiple sequences of varying lengths can be tested simultaneously and the result is reported in a comprehensive tabular format. The tool also reports the strength of the prediction.
The tool and source code of PPS-CBM is available at http://keralabs.org.
利用计算技术在DNA序列中识别启动子是一个重要的研究领域,因为它与转录调控直接相关。有多种算法可用于启动子预测。其中大多数依赖聚合酶,无法同等地处理真核生物和原核生物。本研究提出一种不依赖聚合酶的算法,该算法可基于序列特征和统计元素预测给定的DNA片段是否为启动子。该算法考虑了由核苷酸A、C、G和T形成的所有可能的五聚体,以及CpG岛、TATA盒、起始元件和下游启动子元件。该算法的亮点在于它不是聚合酶特异性的,即使启动子识别的潜在生物学机制差异很大,它也能以相同的计算方式对真核生物和原核生物进行预测。所提出的方法,即启动子预测系统 - PPS - CBM,在大肠杆菌数据集上的灵敏度、特异性和准确率分别达到75.08%、83.58%和79.33%,在人类数据集上分别为86.67%、88.41%和87.58%。我们基于所提出的算法PPS - CBM开发了一个工具,利用该工具可以同时测试多个不同长度的序列,并以综合表格形式报告结果。该工具还报告预测的强度。
PPS - CBM的工具和源代码可在http://keralabs.org获取。