Mahadevan I, Ghosh I
Astra Research Centre India, Bangalore.
Nucleic Acids Res. 1994 Jun 11;22(11):2158-65. doi: 10.1093/nar/22.11.2158.
Backpropagation neural network is trained to identify E.coli promoters of all spacing classes (15 to 21). A three module approach is employed wherein the first neural net module predicts the consensus boxes, the second module aligns the promoters to a length of 65 bases and the third neural net module predicts the entire sequence of 65 bases taking care of the possible interdependencies between the bases in the promoters. The networks were trained with 106 promoters and random sequences which were 60% AT rich and tested on 126 promoters (Bacterial, Mutant and Phage promoters). The network was 98% successful in promoter recognition and 90.2% successful in non-promoter recognition when tested on 5000 randomly generated sequences. The network was further trained with 11 mutated non-promoters and 8 mutated promoters of the p22ant promoter. The testing set with 7 mutated promoters and 13 mutated non-promoters of p22ant were identified. The network was upgraded using total 1665 data of promoters and non-promoters to identify any promoter sequences in the gene sequences. The network identified the locations of P1, P2 and P3 promoters in the pBR322 plasmid. A search for the start codon, Ribosomal Binding Site and the stop codon by a string search procedure has also been added to find the possible promoters that can yield protein products. The network was also successfully tested on a synthetic plasmid pWM528.
反向传播神经网络经过训练,用于识别所有间隔类别(15至21)的大肠杆菌启动子。采用了一种三模块方法,其中第一个神经网络模块预测共有序列框,第二个模块将启动子比对成长度为65个碱基的序列,第三个神经网络模块预测65个碱基的完整序列,同时考虑启动子中碱基之间可能的相互依赖性。使用106个启动子和富含60%AT的随机序列对网络进行训练,并在126个启动子(细菌、突变体和噬菌体启动子)上进行测试。在对5000个随机生成的序列进行测试时,该网络在启动子识别方面成功率为98%,在非启动子识别方面成功率为90.2%。该网络进一步使用11个p22ant启动子的突变非启动子和8个突变启动子进行训练。识别出了包含7个p22ant突变启动子和13个突变非启动子的测试集。使用总共1665个启动子和非启动子的数据对网络进行升级,以识别基因序列中的任何启动子序列。该网络确定了pBR322质粒中P1、P2和P3启动子的位置。还添加了通过字符串搜索程序搜索起始密码子、核糖体结合位点和终止密码子的操作,以找到可能产生蛋白质产物的启动子。该网络在合成质粒pWM528上也成功进行了测试。