Wu C, Berry M, Fung Y S, McLarty J
Department of Epidemiology/Biomathematics, University of Texas Health Center at Tyler 75710, USA.
Proc Int Conf Intell Syst Mol Biol. 1993;1:429-37.
A neural network classification method has been developed as an alternative approach to the search/organization problem of large molecular databases. Two artificial neural systems have been implemented on a Cray supercomputer for rapid protein/nucleic acid sequence classifications. The neural networks used are three-layered, feed-forward networks that employ back-propagation learning algorithm. The molecular sequences are encoded into neural input vectors by applying an n-gram hashing method or a SVD (singular value decomposition) method. Once trained with known sequences in the molecular databases, the neural system becomes an associative memory capable of classifying unknown sequences based on the class information embedded in its neural interconnections. The protein system, which classifies proteins into PIR (Protein Identification Resource) superfamilies, showed a 82% to a close to 100% sensitivity at a speed that is about an order of magnitude faster than other search methods. The pilot nucleic acid system, which classifies ribosomal RNA sequences according to phylogenetic groups, has achieved a 100% classification accuracy. The system could be used to reduce the database search time and help organize the molecular sequence databases. The tool is generally applicable to any databases that are organized according to family relationships.
一种神经网络分类方法已被开发出来,作为解决大分子数据库搜索/组织问题的替代方法。已在一台Cray超级计算机上实现了两个人工神经系统,用于快速进行蛋白质/核酸序列分类。所使用的神经网络是采用反向传播学习算法的三层前馈网络。通过应用n元语法哈希方法或奇异值分解(SVD)方法,将分子序列编码为神经输入向量。一旦用分子数据库中的已知序列进行训练,该神经系统就成为一个联想记忆体,能够根据其神经连接中嵌入的类别信息对未知序列进行分类。将蛋白质分类为蛋白质识别资源(PIR)超家族的蛋白质系统,其灵敏度在82%至接近100%之间,速度比其他搜索方法快约一个数量级。将核糖体RNA序列按系统发育组进行分类的试验性核酸系统,已实现了100%的分类准确率。该系统可用于减少数据库搜索时间,并有助于组织分子序列数据库。该工具通常适用于任何根据家族关系组织的数据库。