Bisant D, Maizel J
Neuroscience Program (151 B), Stanford University, CA 94305, USA.
Nucleic Acids Res. 1995 May 11;23(9):1632-9. doi: 10.1093/nar/23.9.1632.
This study investigated the use of neural networks in the identification of Escherichia coli ribosome binding sites. The recognition of these sites based on primary sequence data is difficult due to the multiple determinants that define them. Additionally, secondary structure plays a significant role in the determination of the site and this information is difficult to include in the models. Efforts to solve this problem have so far yielded poor results. A new compilation of E. coli ribosome binding sites was generated for this study. Feedforward backpropagation networks were applied to their identification. Perceptrons were also applied, since they have been the previous best method since 1982. Evaluation of performance for all the neural networks and perceptrons was determined by ROC analysis. The neural network provided significant improvement in the recognition of these sites when compared with the previous best method, finding less than half the number of false positives when both models were adjusted to find an equal number of actual sites. The best neural network used an input window of 101 nucleotides and a single hidden layer of 9 units. Both the neural network and the perceptron trained on the new compilation performed better than the original perceptron published by Stormo et al. in 1982.
本研究调查了神经网络在识别大肠杆菌核糖体结合位点中的应用。由于定义这些位点的决定因素众多,基于一级序列数据识别这些位点具有难度。此外,二级结构在确定位点方面起着重要作用,而这些信息难以纳入模型中。迄今为止,解决该问题的努力成效不佳。本研究生成了一份新的大肠杆菌核糖体结合位点汇编。前馈反向传播网络被应用于它们的识别。感知器也被应用,因为自1982年以来它们一直是此前最佳的方法。所有神经网络和感知器的性能评估通过ROC分析来确定。与之前的最佳方法相比,神经网络在识别这些位点方面有显著改进,当两个模型都调整为找到相同数量的实际位点时,神经网络发现的假阳性数量不到之前最佳方法的一半。最佳的神经网络使用了101个核苷酸的输入窗口和一个包含9个单元的单隐藏层。在新汇编上训练的神经网络和感知器都比斯托莫等人在1982年发表的原始感知器表现更好。