Dombi G W, Lawrence J
Surgery Department, Wayne State University, Detroit, Michigan 48201.
Protein Sci. 1994 Apr;3(4):557-66. doi: 10.1002/pro.5560030404.
Neural networks were used to generalize common themes found in transmembrane-spanning protein helices. Various-sized databases were used containing nonoverlapping sequences, each 25 amino acids long. Training consisted of sorting these sequences into 1 of 2 groups: transmembrane helical peptides or nontransmembrane peptides. Learning was measured using a test set 10% the size of the training set. As training set size increased from 214 sequences to 1,751 sequences, learning increased in a nonlinear manner from 75% to a high of 98%, then declined to a low of 87%. The final training database consisted of roughly equal numbers of transmembrane (928) and nontransmembrane (1,018) sequences. All transmembrane sequences were entered into the database with respect to their lipid membrane orientation: from inside the membrane to outside. Generalized transmembrane helix and nontransmembrane peptides were constructed from the maximally weighted connecting strengths of fully trained networks. Four generalized transmembrane helices were found to contain 9 consensus residues: a K-R-F triplet was found at the inside lipid interface, 2 isoleucine and 2 other phenylalanine residues were present in the helical body, and 2 tryptophan residues were found near the outside lipid interface. As a test of the training method, bacteriorhodopsin was examined to determine the position of its 7 transmembrane helices.
神经网络被用于归纳跨膜蛋白螺旋中发现的共同主题。使用了各种大小的数据库,其中包含不重叠的序列,每个序列长25个氨基酸。训练包括将这些序列分为两组中的一组:跨膜螺旋肽或非跨膜肽。使用大小为训练集10%的测试集来衡量学习效果。随着训练集大小从214个序列增加到1751个序列,学习效果以非线性方式从75%增加到最高98%,然后下降到最低87%。最终的训练数据库由大致相等数量的跨膜序列(928个)和非跨膜序列(1018个)组成。所有跨膜序列都按照其脂质膜方向输入数据库:从膜内到膜外。根据完全训练网络的最大加权连接强度构建了广义跨膜螺旋和非跨膜肽。发现四个广义跨膜螺旋包含9个共有残基:在脂质膜内侧界面发现一个K-R-F三联体,螺旋体中存在2个异亮氨酸和2个其他苯丙氨酸残基,在脂质膜外侧界面附近发现2个色氨酸残基。作为对训练方法的测试,对细菌视紫红质进行了检查,以确定其7个跨膜螺旋的位置。