Mishra Hrishikesh, Singh Nitya, Misra Krishna, Lahiri Tapobrata
Division of Applied Sciences and Indo-Russian Centre for Biotechnology, Indian Institute of Information Technology, Allahabad, India.
Bioinformation. 2011;6(6):240-3. doi: 10.6026/97320630006240. Epub 2011 Jun 6.
Identification of promoter region is an important part of gene annotation. Identification of promoters in eukaryotes is important as promoters modulate various metabolic functions and cellular stress responses. In this work, a novel approach utilizing intensity values of tilling microarray data for a model eukaryotic plant Arabidopsis thaliana, was used to specify promoter region from non-promoter region. A feed-forward back propagation neural network model supported by genetic algorithm was employed to predict the class of data with a window size of 41. A dataset comprising of 2992 data vectors representing both promoter and non-promoter regions, chosen randomly from probe intensity vectors for whole genome of Arabidopsis thaliana generated through tilling microarray technique was used. The classifier model shows prediction accuracy of 69.73% and 65.36% on training and validation sets, respectively. Further, a concept of distance based class membership was used to validate reliability of classifier, which showed promising results. The study shows the usability of micro-array probe intensities to predict the promoter regions in eukaryotic genomes.
启动子区域的识别是基因注释的重要组成部分。真核生物中启动子的识别很重要,因为启动子可调节各种代谢功能和细胞应激反应。在这项工作中,一种利用模式真核植物拟南芥的耕作微阵列数据强度值的新方法,被用于从非启动子区域中确定启动子区域。采用由遗传算法支持的前馈反向传播神经网络模型,以41的窗口大小预测数据类别。使用了一个数据集,该数据集由2992个代表启动子和非启动子区域的数据向量组成,这些数据向量是从通过耕作微阵列技术生成的拟南芥全基因组探针强度向量中随机选择的。分类器模型在训练集和验证集上的预测准确率分别为69.73%和65.36%。此外,基于距离的类成员概念被用于验证分类器的可靠性,结果很有前景。该研究表明微阵列探针强度可用于预测真核生物基因组中的启动子区域。