使用人工神经网络将蛋白质聚类成家族。

Clustering proteins into families using artificial neural networks.

作者信息

Ferrán E A, Ferrara P

机构信息

Sanofi Elf Bio Recherches, Labège Innopole, France.

出版信息

Comput Appl Biosci. 1992 Feb;8(1):39-44. doi: 10.1093/bioinformatics/8.1.39.

DOI:10.1093/bioinformatics/8.1.39

PMID:1314686

Abstract

An artificial neural network was used to cluster proteins into families. The network, composed of 7 x 7 neurons, was trained with the Kohonen unsupervised learning algorithm using, as inputs, matrix patterns derived from the bipeptide composition of 447 proteins, belonging to 13 different families. As a result of the training, and without any a priori indication of the number or composition of the expected families, the network self-organized the activation of its neurons into topologically ordered maps in which almost all the proteins (96.7%) were correctly clustered into the corresponding families. In a second computational experiment, a similar network was trained with one family of the previous learning set (76 cytochrome c sequences). The new neural map clustered these proteins into 25 different neurons (five in the first experiment), wherein phylogenetically related sequences were positioned close to each other. This result shows that the network can adapt the clustering resolution to the complexity of the learning set, a useful feature when working with an unknown number of clusters. Although the learning stage is time consuming, once the topological map is obtained, the classification of new proteins is very fast. Altogether, our results suggest that this novel approach may be a useful tool to organize the search for homologies in large macromolecular databases.

摘要

使用人工神经网络将蛋白质聚类成家族。该网络由7×7个神经元组成，采用Kohonen无监督学习算法进行训练，其输入是从属于13个不同家族的447种蛋白质的双肽组成衍生而来的矩阵模式。训练的结果是，在没有关于预期家族数量或组成的任何先验指示的情况下，网络将其神经元的激活自组织成拓扑有序图，其中几乎所有蛋白质（96.7%）都被正确聚类到相应家族中。在第二个计算实验中，用前一个学习集的一个家族（76个细胞色素c序列）对类似的网络进行训练。新的神经图将这些蛋白质聚类到25个不同的神经元中（在第一个实验中有5个），其中系统发育相关的序列彼此靠近定位。这一结果表明，该网络可以使聚类分辨率适应学习集的复杂性，这在处理未知数量的聚类时是一个有用的特性。虽然学习阶段很耗时，但一旦获得拓扑图，新蛋白质的分类就非常快。总之，我们的结果表明，这种新方法可能是在大型大分子数据库中组织同源性搜索的有用工具。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用人工神经网络将蛋白质聚类成家族。

Clustering proteins into families using artificial neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

使用人工神经网络将蛋白质聚类成家族。

Clustering proteins into families using artificial neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献