Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China, Department of Computational Medicine and Bioinformatics and Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2013 Oct 15;29(20):2579-87. doi: 10.1093/bioinformatics/btt440. Epub 2013 Aug 14.
Residue-residue contacts across the transmembrane helices dictate the three-dimensional topology of alpha-helical membrane proteins. However, contact determination through experiments is difficult because most transmembrane proteins are hard to crystallize.
We present a novel method (MemBrain) to derive transmembrane inter-helix contacts from amino acid sequences by combining correlated mutations and multiple machine learning classifiers. Tested on 60 non-redundant polytopic proteins using a strict leave-one-out cross-validation protocol, MemBrain achieves an average accuracy of 62%, which is 12.5% higher than the current best method from the literature. When applied to 13 recently solved G protein-coupled receptors, the MemBrain contact predictions helped increase the TM-score of the I-TASSER models by 37% in the transmembrane region. The number of foldable cases (TM-score >0.5) increased by 100%, where all G protein-coupled receptor templates and homologous templates with sequence identity >30% were excluded. These results demonstrate significant progress in contact prediction and a potential for contact-driven structure modeling of transmembrane proteins.
跨膜螺旋之间的残基-残基接触决定了α-螺旋膜蛋白的三维拓扑结构。然而,由于大多数跨膜蛋白难以结晶,通过实验确定接触点是困难的。
我们提出了一种新的方法(MemBrain),通过结合相关突变和多个机器学习分类器,从氨基酸序列中推导出跨膜螺旋间的接触。在使用严格的留一法交叉验证协议对 60 个非冗余的多拓扑蛋白进行测试时,MemBrain 的平均准确率为 62%,比文献中目前最好的方法高 12.5%。当应用于 13 个最近解决的 G 蛋白偶联受体时,MemBrain 的接触预测有助于将 I-TASSER 模型在跨膜区域的 TM 评分提高 37%。可折叠的情况(TM 评分>0.5)增加了 100%,其中排除了所有 G 蛋白偶联受体模板和序列同一性>30%的同源模板。这些结果表明在接触预测方面取得了显著进展,并为跨膜蛋白的接触驱动结构建模提供了潜力。