a Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine , Hamadan University of Medical Sciences , Hamadan , Iran.
b School of BioMedical Sciences and the Astbury Centre for Structural Molecular Biology , University of Leeds , Leeds , UK.
J Biomol Struct Dyn. 2018 Feb;36(2):443-464. doi: 10.1080/07391102.2017.1285725. Epub 2017 Feb 15.
We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.
我们报告了对 235 个高分辨率整体膜蛋白结构中跨膜螺旋的数量、长度和氨基酸组成的综合分析。比较了这些结构中 1551 个跨膜螺旋的特性与使用拓扑预测工具对相同氨基酸序列进行分析所获得的特性。确定了预测结果中 81 个(5.2%)缺失或额外跨膜螺旋的解释。缺失跨膜螺旋的主要原因是 N 端信号肽的错误识别、α-螺旋构象中断或跨膜螺旋中间带电荷的残基以及具有异常氨基酸组成的跨膜螺旋。额外跨膜螺旋的主要原因是错误识别两亲性螺旋、膜外螺旋或发夹内回环。跨膜螺旋长度的总体中位数为 24 个残基,平均值为 24.9±7.0 个残基,最常见的长度为 23 个残基。跨膜螺旋中残基的总体含量占全长蛋白质的中位数为 56.8%,平均值为 55.7±16.0%。对全长蛋白质、跨膜螺旋和膜外区进行了氨基酸组成分析。鉴定了含有个别氨基酸含量极端或具有相似物理化学性质的氨基酸组合的个别蛋白质或具有跨膜螺旋的蛋白质类型,并将其与结构和/或功能联系起来。除了总体中位数和平均值外,还针对来自不同类型生物体(原核生物、真核生物、病毒)的蛋白质以及受体、通道、转运蛋白等亚组的蛋白质分析了所有结果。