Suppr超能文献

A hybrid method to cluster protein sequences based on statistics and artificial neural networks.

作者信息

Ferrán E A, Pflugfelder B

机构信息

Sanofi Elf Bio Recherches, Labège Innopole, France.

出版信息

Comput Appl Biosci. 1993 Dec;9(6):671-80. doi: 10.1093/bioinformatics/9.6.671.

Abstract

We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns derived from the bipeptide composition of the protein sequences. We describe here some further improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) determination of the optimal number M of clusters and (iii) final classification of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the classification given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with a few principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.

摘要

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验