• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质序列的拓扑图

Topological maps of protein sequences.

作者信息

Ferrán E A, Ferrara P

机构信息

Sanofi Elf Bio Recherches, Lebège Innopole, France.

出版信息

Biol Cybern. 1991;65(6):451-8. doi: 10.1007/BF00204658.

DOI:10.1007/BF00204658
PMID:1958730
Abstract

A new method based on neural networks to cluster proteins into families is described. The network is trained with the Kohonen unsupervised learning algorithm, using matrix pattern representations of the protein sequences as inputs. The components (x, y) of these 20 x 20 matrix patterns are the normalized frequencies of all pairs xy of amino acids in each sequence. We investigate the influence of different learning parameters in the final topological maps obtained with a learning set of ten proteins belonging to three established families. In all cases, except in those where the synaptic vectors remains nearly unchanged during learning, the ten proteins are correctly classified into the expected families. The classification by the trained network of mutated or incomplete sequences of the learned proteins is also analysed. The neural network gives a correct classification for a sequence mutated in 21.5% +/- 7% of its amino acids and for fragments representing 7.5% +/- 3% of the original sequence. Similar results were obtained with a learning set of 32 proteins belonging to 15 families. These results show that a neural network can be trained following the Kohonen algorithm to obtain topological maps of protein sequences, where related proteins are finally associated to the same winner neuron or to neighboring ones, and that the trained network can be applied to rapidly classify new sequences. This approach opens new possibilities to find rapid and efficient algorithms to organize and search for homologies in the whole protein database.

摘要

描述了一种基于神经网络将蛋白质聚类成家族的新方法。该网络使用Kohonen无监督学习算法进行训练,将蛋白质序列的矩阵模式表示作为输入。这些20×20矩阵模式的分量(x,y)是每个序列中所有氨基酸对xy的归一化频率。我们研究了不同学习参数对使用属于三个既定家族的十个蛋白质的学习集获得的最终拓扑图的影响。在所有情况下,除了那些在学习过程中突触向量几乎保持不变的情况外,这十个蛋白质都被正确分类到预期的家族中。还分析了训练后的网络对所学蛋白质的突变或不完整序列的分类。对于氨基酸突变率为21.5%±7%的序列以及代表原始序列7.5%±3%的片段,神经网络给出了正确的分类。使用属于15个家族的32个蛋白质的学习集也获得了类似的结果。这些结果表明,可以按照Kohonen算法训练神经网络以获得蛋白质序列的拓扑图,其中相关蛋白质最终与同一个获胜神经元或相邻神经元相关联,并且训练后的网络可用于快速分类新序列。这种方法为找到快速有效的算法来组织和搜索整个蛋白质数据库中的同源性开辟了新的可能性。

相似文献

1
Topological maps of protein sequences.蛋白质序列的拓扑图
Biol Cybern. 1991;65(6):451-8. doi: 10.1007/BF00204658.
2
Self-organized neural maps of human protein sequences.人类蛋白质序列的自组织神经图谱。
Protein Sci. 1994 Mar;3(3):507-21. doi: 10.1002/pro.5560030316.
3
Clustering proteins into families using artificial neural networks.使用人工神经网络将蛋白质聚类成家族。
Comput Appl Biosci. 1992 Feb;8(1):39-44. doi: 10.1093/bioinformatics/8.1.39.
4
Protein classification using neural networks.
Proc Int Conf Intell Syst Mol Biol. 1993;1:127-35.
5
A hybrid method to cluster protein sequences based on statistics and artificial neural networks.
Comput Appl Biosci. 1993 Dec;9(6):671-80. doi: 10.1093/bioinformatics/9.6.671.
6
Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures.
Comput Appl Biosci. 1996 Dec;12(6):447-54. doi: 10.1093/bioinformatics/12.6.447.
7
Machine learning can be used to distinguish protein families and generate new proteins belonging to those families.机器学习可用于区分蛋白质家族并生成属于这些家族的新蛋白质。
J Chem Phys. 2019 Nov 7;151(17):175102. doi: 10.1063/1.5126225.
8
Local structural motifs of protein backbones are classified by self-organizing neural networks.
Protein Eng. 1996 Oct;9(10):833-42. doi: 10.1093/protein/9.10.833.
9
Structural SCOP superfamily level classification using unsupervised machine learning.使用无监督机器学习进行结构 SCOP 超家族水平分类。
IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):601-8. doi: 10.1109/TCBB.2011.114. Epub 2011 Aug 4.
10
Prediction of contact maps with neural networks and correlated mutations.利用神经网络和相关突变预测接触图。
Protein Eng. 2001 Nov;14(11):835-43. doi: 10.1093/protein/14.11.835.

引用本文的文献

1
AlignScape, displaying sequence similarity using self-organizing maps.AlignScape,使用自组织映射显示序列相似性。
Front Bioinform. 2024 Jan 26;4:1321508. doi: 10.3389/fbinf.2024.1321508. eCollection 2024.
2
SOMMER: self-organising maps for education and research.索默:用于教育和研究的自组织映射图。
J Mol Model. 2007 Jan;13(1):225-8. doi: 10.1007/s00894-006-0140-0. Epub 2006 Sep 22.
3
Self-organizing tree-growing network for the classification of protein sequences.用于蛋白质序列分类的自组织树生长网络

本文引用的文献

1
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
2
Rapid similarity searches of nucleic acid and protein data banks.核酸和蛋白质数据库的快速相似性搜索。
Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30. doi: 10.1073/pnas.80.3.726.
3
A comprehensive set of sequence analysis programs for the VAX.一套适用于VAX的综合序列分析程序。
Protein Sci. 1998 Dec;7(12):2613-22. doi: 10.1002/pro.5560071215.
4
Self-organizing hierarchic networks for pattern recognition in protein sequence.用于蛋白质序列模式识别的自组织层次网络
Protein Sci. 1996 Jan;5(1):72-82. doi: 10.1002/pro.5560050109.
5
Self-organized neural maps of human protein sequences.人类蛋白质序列的自组织神经图谱。
Protein Sci. 1994 Mar;3(3):507-21. doi: 10.1002/pro.5560030316.
Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387-95. doi: 10.1093/nar/12.1part1.387.
4
Pattern recognition in several sequences: consensus and alignment.多个序列中的模式识别:共有序列与比对
Bull Math Biol. 1984;46(4):515-27. doi: 10.1007/BF02459500.
5
A general method applicable to the search for similarities in the amino acid sequence of two proteins.一种适用于寻找两种蛋白质氨基酸序列相似性的通用方法。
J Mol Biol. 1970 Mar;48(3):443-53. doi: 10.1016/0022-2836(70)90057-4.
6
Method for clustering proteins by use of all possible pairs of amino acids as structural descriptors.
J Chem Inf Comput Sci. 1988 May;28(2):72-8. doi: 10.1021/ci00058a006.
7
Predicting the secondary structure of globular proteins using neural network models.使用神经网络模型预测球状蛋白质的二级结构。
J Mol Biol. 1988 Aug 20;202(4):865-84. doi: 10.1016/0022-2836(88)90564-5.
8
Multiple sequence alignment with hierarchical clustering.采用层次聚类的多序列比对。
Nucleic Acids Res. 1988 Nov 25;16(22):10881-90. doi: 10.1093/nar/16.22.10881.
9
Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network.
Comput Appl Biosci. 1990 Oct;6(4):319-24. doi: 10.1093/bioinformatics/6.4.319.
10
Protein database searches for multiple alignments.用于多序列比对的蛋白质数据库搜索。
Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509-13. doi: 10.1073/pnas.87.14.5509.