O'Neill M C
Department of Biological Sciences, University of Maryland, Baltimore County 21228.
Nucleic Acids Res. 1991 Jan 25;19(2):313-8. doi: 10.1093/nar/19.2.313.
A three layered back-propagation neural network was trained to recognize E. coli promoters of the 17 base spacing class. To this end, the network was presented with 39 promoter sequences and derivatives of those sequences as positive inputs; 60% A + T random sequences and sequences containing 2 promoter-down point mutations were used as negative inputs. The entire promoter sequence of 58 bases, approximately -50 to +8, was entered as input. The network was asked to associate an output of 1.0 with promoter sequence input and 0.0 with non-promoter input. Generally, after 100,000 input cycles, the network was virtually perfect in classifying the training set. A trained network was about 80% effective in recognizing 'new' promoters which were not in the training set, with a false positive rate below 0.1%. Network searches on pBR322 and on the lambda genome were also performed. Overall the results were somewhat better than the best rule-based procedures. The trained network can be analyzed both for its choice of base and relative weighting, positive and negative, at each position of the sequence. This method, which requires only appropriate input/output training pairs, can be used to define and search for any DNA regulatory sequence for which there are sufficient exemplars.
训练了一个三层反向传播神经网络来识别17个碱基间隔类别的大肠杆菌启动子。为此,向该网络提供了39个启动子序列及其衍生物作为正输入;60%A+T随机序列和包含2个启动子向下点突变的序列用作负输入。输入了58个碱基的整个启动子序列,大约从-50到+8。要求该网络将输出1.0与启动子序列输入相关联,将输出0.0与非启动子输入相关联。一般来说,经过100,000次输入循环后,该网络在对训练集进行分类时几乎完美。一个经过训练的网络在识别不在训练集中的“新”启动子时的效率约为80%,误报率低于0.1%。还对pBR322和λ基因组进行了网络搜索。总体而言,结果比最佳的基于规则的程序要好一些。可以分析经过训练的网络在序列的每个位置上对碱基的选择以及正、负相对权重。这种方法只需要适当的输入/输出训练对,可用于定义和搜索有足够示例的任何DNA调控序列。